Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] jgit binary patches

Hello,

thanks for your quick response. Regarding to base85, the character set seems to be the same as in  rfc1924 - chapter 4.2. (http://tools.ietf.org/html/rfc1924).
I still do not understand delta mechanism. I went through file structure of CGit patch and it seems to be as follows:

GIT binary patch\n
delta <X>\n
<length_byte><base85_encoded_file_content>\n
<length_byte><base85_encoded_file_content>\n
...
\n
delta <X>\n
<length_byte><base85_encoded_file_content>\n
<length_byte><base85_encoded_file_content>\n
...

where <length_byte> is encoded as ascii character A-Z, a-z. 'A' means that line length is 1 and 'z' 52. If you have a look on ascii table, characters: '[', '\', ']', '^', '_', '`' are omitted. It also seems to me that the first block of this structure is binary file before applying changes and the second block is after applying changes. Do you know if it is correct?

I thought that <X> (delta) is maybe kind of checksum because it is a number. I summed all length bytes and difference between delta and my result was less than 0.5%. I'm wondering if this is just coincidence or maybe I made a mistake?

I read your explanation of DeltaEncoder algorithm and I'm wondering if we are talking about the same delta;) or maybe I'm going in totally wrong direction...

My problem is that I wanted to add "generate binary patch" functionality to eclipse and I don't know if it is supported somehow in jgit now? or everything should be implemented from the scratch?

Best Regards,
Marek Chodorowski





From:        Shawn Pearce <spearce@xxxxxxxxxxx>
To:        Marek Chodorowski/Poland/Contr/IBM@IBMPL
Cc:        jgit-dev@xxxxxxxxxxx
Date:        2012-03-24 01:31
Subject:        Re: [jgit-dev] jgit binary patches




On Fri, Mar 23, 2012 at 09:23, Marek Chodorowski
<marek.chodorowski@xxxxxxxxxx> wrote:
> currently, when creating a patch with binary file, the patch contains only
> "Binary files differ" line
> (
https://bugs.eclipse.org/bugs/show_bug.cgi?id=371725). I was looking for
> Git documentation to find out how it is implemented there. All I found is
> user manual how to use Git from end user perspective. Do you know maybe if
> such documentation exists? or if I am allowed to use Git source code to
> check how particular functionality is implemented, for example which
> character set is used in base85 encoding?

Basically I forgot to implement this at some point.

To encode a binary patch you need to use JGit's DeltaEncoder to make
the delta information. You have to run it twice, once in the forward
direction, and again in the reverse direction as a Git binary patch
contains both deltas.

It looks like its a custom base85 implementation. 0-9, A-Z, a-z, and
then the following symbols:

 ! # $ % & ( ) * + -
 ; < = > ? @ ^ _ ` {
 | } ~

To encode base85, the input byte[] is converted to a stream of big
endian 32 bit integers, which are each output in little endian order
as 5 consecutive base85 characters. Whee.  (The reader will note that
85^5 is larger than 2^32 and thus this fits with a bit of "wasted"
space in the last base85 digit.)

The binary diff is output as two deltas:

 old -> new
 new -> old

The delta is created with the DeltaEncoder algorithm, and the output
of that is deflated and then base85 encoded. If the full version of
the file deflated is smaller than the delta deflated, the full version
is used instead as the "literal" format.

Clear as mud?



Back to the top