Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] delta generation during packing

I just pushed my 'delta' series, which creates deltas on the fly
while packing.  This brings us the functionality needed to perform
`git repack`, or at least the first half of `git gc`.

Because this implementation was rebuilt from scratch based on my own
memory of how the packing algorithm has evolved over the years in
C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly
the same rules everywhere, and that leads JGit to produce different
(but logically equivalent) pack files:
    
  Repository | Pack Size (bytes)                | Packing Time
             | JGit     - CGit     = Difference | JGit / CGit
  -----------+----------------------------------+-----------------
   git       | 25094348 - 24322890 = +771458    | 59.434s / 59.133s
   jgit      |  5669515 -  5709046 = - 39531    |  6.654s /  6.806s
   linux-2.6 |     389M -     386M = +3M        | 20m02s  / 18m01s

For the above tests pack.threads was set to 1, window size=10,
delta depth=50, and delta and object reuse was disabled for both
implementations.  Both implementations were reading from an already
fully packed repository on local disk.  The running time reported
is after 1 warm-up run of the tested implementation.

PackWriter is writing 771 KiB more data on git.git, 3M more on
linux-2.6, but is actually 39.5 KiB smaller on jgit.git.  Being
larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an
extra 2 minutes to pack.  On the running time side, JGit is at a
major disadvantage because linux-2.6 doesn't fit into the default
WindowCache of 20M, while C Git is able to mmap the entire pack and
have it available instantly in physical memory (assuming hot cache).

The really critical patches are:

 http://egit.eclipse.org/r/1111 : the delta encoder
 http://egit.eclipse.org/r/1113 : the delta search

 http://egit.eclipse.org/r/1114 : caching deltas
 http://egit.eclipse.org/r/1115 : threaded search
 http://egit.eclipse.org/r/1116 : capping memory usage

-- 
Shawn.


Back to the top