[jgit-dev] Memory-mapped PackIndexV2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[jgit-dev] Memory-mapped PackIndexV2

From: Marc Strapetz <marc.strapetz@xxxxxxxxxxx>
Date: Thu, 15 Oct 2020 00:57:04 +0200
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.3.2

I have been experimenting with a memory-mapped PackedIndexV2implementation and so far results look promising. For large index filesand small operations, i.e. where ratio ofrequired-index-data/read-index-data is very small (e.g. parsing a singlecommit), speedup of factor 100x and more is possible (see experimentsbelow). The current state of my patch is very work-in-progress. Itenables optional use of the memory-mapped pack-index and is meant asbasis for discussion:


https://git.eclipse.org/r/c/jgit/jgit/+/170675

To decide whether/how to continue this work, I would very muchappreciate feedback on following open questions:

(1) How to safely "unmap" the MappedByteBuffer, once the PackIndex isclosed?

As my colleague, Alexandr, has pointed out,sun.misc.Unsafe.invokeCleaner() can be used here. According to ourexperiments, this seems to work fine and for us using this API isacceptable as long as there is no better "supported" API.


(2) Should we handle index files larger than Integer.MAX limit?

This will make the implementation of PackedIndexV2m more complex.Currently I'm not aware of any (public) repositories which are close to2GB pack indexes. On the other hand, real-world repositories like theLinux Kernel[1] (250M) or Chromium[2] (500M) are almost in thismagnitude. Hence, I would support >2GB index files from the very beginning.


(3) Should we handle multi-threaded access to buffers?

The current patch asserts single-thread access, which is sufficient forour Git client. I haven't checked in detail, but from my understandingthis should be true for most of JGit's own code, too. For the fewmulti-threaded usages, the current in-memory PackIndexV2 could be used.

It would be interesting to hear whether Gerrit/EGit and other projectsare using JGit's Pack-API in a single-threaded or multi-threaded way?

Implementing thread-safety should be no big deal. We see followingoptions here:

(a) synchronizing all public methods of the new PackIndexV2m:straight-forward implementation with probably more frequentsynchronized-executions; or

(b) having separate buffers per threads: more complex implementationwith probably less frequent synchronized-executions

We ourselves haven't experienced problems with frequentsynchronized-executions, but I recall that JGit is rather trying toavoid that, if possible?


(4) Have a more reasonable design, once approach for (1)-(3) are clarified

Experiments
===========

I have uploaded my benchmarking code at:

https://git.eclipse.org/r/c/jgit/jgit/+/170797

It requires a recent clone of the Linux repository with just a singlepack-file to run. Benchmarks are comparing current in-memory PackIndexV2with proposed memory-mapped PackIndexV2m.

Benchmarks were performed on my Windows 8.1 machine, quad-core, 8GB RAM,SSD. "Score" denotes the average execution time in ms. "useMmap=true"denotes the memory-mapped version of the benchmark.


Windows Results
---------------
PackIndexV2LoadCommitsBenchmark.testLoadRandomCommits
(commitCount)  (useMmap)  Mode  Cnt    Score    Error  Units
            1      false    ss   20  164,271 ± 16,327  ms/op
            1       true    ss   20    1,779 ±  0,286  ms/op
           10      false    ss   20  165,841 ±  7,374  ms/op
           10       true    ss   20    3,057 ±  0,255  ms/op
          100      false    ss   20  164,650 ±  8,172  ms/op
          100       true    ss   20    8,830 ±  2,218  ms/op
         1000      false    ss   20  190,149 ±  8,033  ms/op
         1000       true    ss   20   49,824 ± 10,934  ms/op

PackIndexV2FindOffsetBenchmark.testFindSingleOffset:
(useMmap)  Mode  Cnt     Score    Error  Units
    false  avgt   20   157,933 ±  5,613  ms/op
     true  avgt   20     0,173 ±  0,053  ms/op

PackIndexV2FindOffsetBenchmark.testFindAllOffsets:
(useMmap)  Mode  Cnt     Score    Error  Units
    false  avgt   20   821,798 ± 16,820  ms/op
     true  avgt   20  1965,568 ± 11,618  ms/op

Linux Results (Ubuntu 18.04 VM, 4 cores, 4G RAM)
------------------------------------------------
(commitCount)  (useMmap)  Mode  Cnt     Score     Error  Units
            1      false    ss   20  1218.530 ±  13.141  ms/op
            1       true    ss   20     7.412 ±   1.619  ms/op
           10      false    ss   20   429.231 ± 353.918  ms/op
           10       true    ss   20    23.468 ±   9.429  ms/op
          100      false    ss   20   236.937 ±  76.485  ms/op
          100       true    ss   20     8.602 ±   5.417  ms/op
         1000      false    ss   20   208.833 ±  13.432  ms/op
         1000       true    ss   20    52.573 ±  30.591  ms/op

PackIndexV2FindOffsetBenchmark.testFindSingleOffset:
(useMmap)  Mode  Cnt     Score    Error  Units
    false  avgt   20   183.936 ± 10.847  ms/op
     true  avgt   20     0.046 ±  0.005  ms/op

PackIndexV2FindOffsetBenchmark.testFindAllOffsets:
(useMmap)  Mode  Cnt     Score    Error  Units
    false  avgt   20   890.854 ± 15.688  ms/op
     true  avgt   20  2047.881 ± 31.824  ms/op

Note that for the memory-mapped benchmarks, WindowCache will be switchedto memory-mapped mode, too (this corresponds to JGit config option"core.packedgitmmap" does). This only affectsPackIndexV2LoadCommitsBenchmark.


[1] https://github.com/torvalds/linux.git
[2] https://chromium.googlesource.com/chromium/src

Thanks for your ideas!

-Marc

Prev by Date: Re: [jgit-dev] Creating new refs on a repository with 100k packed-refs is slow
Next by Date: Re: [jgit-dev] JGit GC unable to remove old pack files on Windows
Previous by thread: [jgit-dev] New patch waiting for review
Next by thread: [jgit-dev] Intermittent error while pushing to git
Index(es):
- Date
- Thread

Breadcrumbs