[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[jgit-dev] delta generation during packing
|
I just pushed my 'delta' series, which creates deltas on the fly
while packing. This brings us the functionality needed to perform
`git repack`, or at least the first half of `git gc`.
Because this implementation was rebuilt from scratch based on my own
memory of how the packing algorithm has evolved over the years in
C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly
the same rules everywhere, and that leads JGit to produce different
(but logically equivalent) pack files:
Repository | Pack Size (bytes) | Packing Time
| JGit - CGit = Difference | JGit / CGit
-----------+----------------------------------+-----------------
git | 25094348 - 24322890 = +771458 | 59.434s / 59.133s
jgit | 5669515 - 5709046 = - 39531 | 6.654s / 6.806s
linux-2.6 | 389M - 386M = +3M | 20m02s / 18m01s
For the above tests pack.threads was set to 1, window size=10,
delta depth=50, and delta and object reuse was disabled for both
implementations. Both implementations were reading from an already
fully packed repository on local disk. The running time reported
is after 1 warm-up run of the tested implementation.
PackWriter is writing 771 KiB more data on git.git, 3M more on
linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being
larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an
extra 2 minutes to pack. On the running time side, JGit is at a
major disadvantage because linux-2.6 doesn't fit into the default
WindowCache of 20M, while C Git is able to mmap the entire pack and
have it available instantly in physical memory (assuming hot cache).
The really critical patches are:
http://egit.eclipse.org/r/1111 : the delta encoder
http://egit.eclipse.org/r/1113 : the delta search
http://egit.eclipse.org/r/1114 : caching deltas
http://egit.eclipse.org/r/1115 : threaded search
http://egit.eclipse.org/r/1116 : capping memory usage
--
Shawn.