[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] insertUnpackedObject() perf regression: j.nio.file.Files.exists() 15x slower than j.io.File.exists()

I think a micro benchmark just looking at the relative performance of Files.exists vs File.exist should not require a 1.2G reference data and a ramdisk to execute.

On 28 Aug 2015, at 15:45, Roberto Tyley <roberto.tyley@xxxxxxxxx> wrote:

A quick packaging of my BFG benchmark tool for the JGit team can be found here:


Clone this repo, and then grab the 1.2GB reference data: https://github.com/rtyley/bfg-bench#reference-data

Note that to do a serious test, you'll need to set up a ramdisk: https://github.com/rtyley/bfg-bench#ram-disk



The BFG and it's benchmark are built with SBT, which is obscure for many people, so for your convenience I've provided pre-compiled jars (including versions before and after Matthias's fix):


All versions of the BFG jar are identical (at https://github.com/rtyley/bfg-repo-cleaner/tree/c54757-benchmark-version), apart from the version of JGit they were built with.


The irony of storing jars in a git repository (when I've spent the last 2 years trying to help people get big objects out) is not lost on me.



On 28 August 2015 at 12:18, Matthias Sohn <matthias.sohn@xxxxxxxxx> wrote:
On Fri, Aug 28, 2015 at 5:45 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
On Thu, Aug 27, 2015 at 11:20 AM, Robin Rosenberg
<robin.rosenberg@xxxxxxxxxx> wrote:
> ----- Ursprungligt meddelande -----
>> FrÃn: "Alex Blewitt" <alex.blewitt@xxxxxxxxx>
>> Till: "Roberto Tyley" <roberto.tyley@xxxxxxxxx>
>> Kopia: jgit-dev@xxxxxxxxxxx, "robin rosenberg" <robin.rosenberg@xxxxxxxxxx>, "matthias sohn" <matthias.sohn@xxxxxxx>,
>> "christian halstrick" <christian.halstrick@xxxxxxx>
>> Skickat: torsdag, 27 aug 2015 19:43:57
>> Ãmne: Re: [jgit-dev] insertUnpackedObject() perf regression: j.nio.file.Files.exists() 15x slower than
>> j.io.File.exists()
>>
>> If you have a gist or a JMH harness for performing tests I'm happy to report
>> back timings for Windows and OSX.
>>
>> > On 27 Aug 2015, at 18:38, Roberto Tyley <roberto.tyley@xxxxxxxxx> wrote:
>> >
>> > Obviously, my preference would be to remove use of the NIO call. I'm
>> > inclined to think it could be removed everywhere, from both FS_POSIX &
>> > FS_Win32, but at the very least from
>> > ObjectDirectory.insertUnpackedObject().
>
> Maybe we should make exceptions for (some) internal paths. The current
> behavior probably only (mostly?) makes sense in the working directory, so we'd need
> two exists() methods in FS.

The loose object exists test should be using java.io.File and not FS.
ObjectDirectory uses FS.resolve() to traverse symlinks to objects but
then once inside objects all 256 shard directories should be real
directories, and the object files should be real files, not dangling
symlinks. java.io.File.exists() is sufficient here, and faster.

Sounds like we could just change ObjectDirectory to use File.exists()
once its computed the File handle.

This does mean JGit cannot run ObjectDirectory code on an abstract
virtual filesystem plugged into NIO2, but I think that is fine. If you
really want to run JGit on an esoteric non-standard filesystem like
"in memory" you should look at the DFS storage backend, which has
fewer abstraction points to deal with. Or write your own from scratch.

I pushed
implementing this proposal.

Roberto: 
could you provide the benchmark you were running on Ubuntu then we could also run it on
Mac and Windows to check what's the impact on other platforms ?

-Matthias 

_______________________________________________
jgit-dev mailing list
jgit-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jgit-dev