|[jgit-dev] delta generation during packing|
I just pushed my 'delta' series, which creates deltas on the fly while packing. This brings us the functionality needed to perform `git repack`, or at least the first half of `git gc`. Because this implementation was rebuilt from scratch based on my own memory of how the packing algorithm has evolved over the years in C Git, PackWriter, DeltaWindow, and DeltaEncoder don't use exactly the same rules everywhere, and that leads JGit to produce different (but logically equivalent) pack files: Repository | Pack Size (bytes) | Packing Time | JGit - CGit = Difference | JGit / CGit -----------+----------------------------------+----------------- git | 25094348 - 24322890 = +771458 | 59.434s / 59.133s jgit | 5669515 - 5709046 = - 39531 | 6.654s / 6.806s linux-2.6 | 389M - 386M = +3M | 20m02s / 18m01s For the above tests pack.threads was set to 1, window size=10, delta depth=50, and delta and object reuse was disabled for both implementations. Both implementations were reading from an already fully packed repository on local disk. The running time reported is after 1 warm-up run of the tested implementation. PackWriter is writing 771 KiB more data on git.git, 3M more on linux-2.6, but is actually 39.5 KiB smaller on jgit.git. Being larger by less than 0.7% on linux-2.6 isn't bad, nor is taking an extra 2 minutes to pack. On the running time side, JGit is at a major disadvantage because linux-2.6 doesn't fit into the default WindowCache of 20M, while C Git is able to mmap the entire pack and have it available instantly in physical memory (assuming hot cache). The really critical patches are: http://egit.eclipse.org/r/1111 : the delta encoder http://egit.eclipse.org/r/1113 : the delta search http://egit.eclipse.org/r/1114 : caching deltas http://egit.eclipse.org/r/1115 : threaded search http://egit.eclipse.org/r/1116 : capping memory usage -- Shawn.
Back to the top