Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] jgit binary patches

On Fri, Mar 23, 2012 at 09:23, Marek Chodorowski
<marek.chodorowski@xxxxxxxxxx> wrote:
> currently, when creating a patch with binary file, the patch contains only
> "Binary files differ" line
> (https://bugs.eclipse.org/bugs/show_bug.cgi?id=371725). I was looking for
> Git documentation to find out how it is implemented there. All I found is
> user manual how to use Git from end user perspective. Do you know maybe if
> such documentation exists? or if I am allowed to use Git source code to
> check how particular functionality is implemented, for example which
> character set is used in base85 encoding?

Basically I forgot to implement this at some point.

To encode a binary patch you need to use JGit's DeltaEncoder to make
the delta information. You have to run it twice, once in the forward
direction, and again in the reverse direction as a Git binary patch
contains both deltas.

It looks like its a custom base85 implementation. 0-9, A-Z, a-z, and
then the following symbols:

  ! # $ % & ( ) * + -
  ; < = > ? @ ^ _ ` {
  | } ~

To encode base85, the input byte[] is converted to a stream of big
endian 32 bit integers, which are each output in little endian order
as 5 consecutive base85 characters. Whee.  (The reader will note that
85^5 is larger than 2^32 and thus this fits with a bit of "wasted"
space in the last base85 digit.)

The binary diff is output as two deltas:

  old -> new
  new -> old

The delta is created with the DeltaEncoder algorithm, and the output
of that is deflated and then base85 encoded. If the full version of
the file deflated is smaller than the delta deflated, the full version
is used instead as the "literal" format.

Clear as mud?


Back to the top