Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Refs creation slowness with Jgit

On Wed, Sep 30, 2020 at 12:07 AM Matthias Sohn <matthias.sohn@xxxxxxxxx> wrote:
On Tue, Sep 29, 2020 at 9:36 PM Martin Fick <mfick@xxxxxxxxxxxxxx> wrote:
> When investigating slow Gerrit NoteDB migration times, we noticed a
> slowness with jgit creating refs
...
> I have a small sample program[1] which illustrates the slowness. It
> creates two refs 'refs/heads/test_simple' and 'refs/heads/test/foo'. On
> executing the program, I see output:
>
> test/foo: 325 ms
> test_simple: 4 ms
...
> The slowness seems to be stemming from the Files.getFileStore(dir) call in
> FS.FileStoreAttributes.getFileStoreAttributes(Path dir).
...

On Friday, September 25, 2020 12:09:47 PM MDT kaushikl@xxxxxxxxxxxxxx wrote:
> I ran the custom program shared in the original post on few more of our
> machines. The results vary quite a bit for me.
>
> Machine-1
> test/foo: 11 ms
> test_simple: 3 ms
...
> Machine-4
> test/foo: 115 ms
> test_simple: 4 ms
...
> Machine-6
> test/foo: 1320 ms
> test_simple: 5 ms
...

Yikes, that shows results across the board within 2 orders of magnitude. I
think that regardless of why, it can be problematic to have
theFiles.getFileStore(dir) call in the fast path of any ref operation (read,
write, delete...). Should we consider reverting the jgit code that does this?

this call is used to decide when we can trust a file timestamp to not be racy
 
Alternatively, would it make sense to make this call only once at the top
level of the .git/refs directory and consider it the same for all directories
below that? It would likely not be safe for users to create links inside the
.git/refs directories anyway as a ref-packing operation could delete it at any
time?

I was thinking about making this call only once per repository at the top level of the .git
repository assuming each repository resides in a single volume and cache the result
per repository.  FileAttributes are used by FileSnapshot and LockFile. Though some
of the classes using FileSnapshot don't have a reference to the corresponding
repository. Need a find a way to introduce such caching without changing a lot of APIs.

I pushed https://git.eclipse.org/r/c/jgit/jgit/+/170138 for review which caches
FileStore per repository assuming each repository is stored in a single FileStore.

-Matthias

Back to the top