[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] insertUnpackedObject() perf regression: j.nio.file.Files.exists() 15x slower than j.io.File.exists()


could you please create a bug report from your post so that we can continue there? Also it would be interesting if you could contribute the patch which we can play with.


On Thursday 27 August 2015 18:38 Roberto Tyley wrote:
> While updating the BFG Repo-Cleaner to JGit 4.0, I was surprised to see a
> consistent performance regression on clean-large-repo benchmarks (/dev/shm
> on Ubuntu 15.04, OpenJDK 1.8.0_45). Total run time had increased by ~10%.
> After some fairly chunky automated git-bisecting, I traced the regression
> to this change:
> "Merge bundle org.eclipse.jgit.java7 into org.eclipse.jgit" -
> https://git.eclipse.org/r/43768
> Change-Id: Ib5da61b0886ddbdea65298f1e8c6d65c9879ced1
> ...and then *specifically* to the exists() method in FS_POSIX, now
> overriding the default implementation in FS:
> https://github.com/eclipse/jgit/blob/197e339/org.eclipse.jgit/src/org/eclipse/jgit/util/FS_POSIX.java#L359-L361
> The default implementation of FS.exists() uses java.io.File.exists(), while
> the new implementation in FS_POSIX uses java.nio.file.Files.exists() - by
> simply removing the override in FS_POSIX, performance was restored.
> Profiling the BFG benchmark, it became clear that in my environment at
> least, j.nio.file.Files.exists() is substantially slower than
> j.io.File.exists(), to the point where the exists() call *doubles* the
> average cost of a call to ObjectDirectory.insertUnpackedObject() - which
> the BFG uses a lot, because it's rewriting history. Average times are:
> j.io.File.exists() - 4 microseconds
> j.nio.file.Files.exists() - 60 microseconds
> If you look at the implementation of j.nio.file.Files.exists() it's not
> hard to believe it's slower, j.io.File.exists() drops almost immediately to
> a native method, while the NIO method has multiple if statements & layers
> of indirection (profiling says the real cost comes quite far down, in calls
> to 'readAttributes' methods).
> I had a look at the change that originally introduced use of
> j.nio.file.Files.exists() (for org.eclipse.jgit.java7 only at that point):
> "Extend the FS class for Java7" - https://git.eclipse.org/r/9378
> Change-Id: I834b06d0447f84379612b8c9190fa77093617595
> The commit message on this change mentions that "there are claims that
> Files.exists is faster the File.exists" (ie that NIO is faster, contrary to
> what I've seen). I think that claim might have come from here:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=353771#c8
> I can't explain the conflicting results for the performance of NIO
> exists(), though obviously there are many different OSs and Java versions
> out there, which may behave differently. I've only tested on Ubuntu so far,
> but can grab a Mac. Getting a Windows box is more difficult for me, let me
> know if you'd like to volunteer for benchmarking duty.
> Aside from performance, I'm not sure if there was any other strong
> motivation for using j.nio.file.Files.exists(). It's called with
> LinkOption.NOFOLLOW_LINKS, which means that when checking a symbolic link,
> it will return true so long as the symbolic link is there, regardless of
> whether the link points to an existing file. This differs from
> j.io.File.exists(), which *does* follow the link, and returns false if the
> underlying file is not present. Is the NIO behavior useful?
> Obviously, my preference would be to remove use of the NIO call. I'm
> inclined to think it could be removed everywhere, from both FS_POSIX
> & FS_Win32, but at the very least from
> ObjectDirectory.insertUnpackedObject().
> Roberto
> PS I have to confess the difference in total execution time run is just 45s
> rather than 41s, working on a 1.1GB repository - no BFG users will really
> care. But, having spent so long trying to create a fast tool, a 10%
> regression is pretty hard for me to swallow!

Kind regards,