Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Performance of commit preparation

On Wed, Oct 20, 2010 at 8:15 AM, Baumgart, Jens <jens.baumgart@xxxxxxx> wrote:
> The tree walk over repo, workdir and index takes 23 seconds (with empty
> while loop) for our test repo (linux kernel).
> Git commit from native git takes 7 seconds. Is this a realistic relation
> from C to Java or is there improvement left in the tree walk that calculates
> the changes? Implementation looks as follows:

LIke Robin said in his message, can you trace and find out what we are
doing stats on, and what we are opening and reading content from?

One of the disadvantages we have in Java is we cannot get access to
the type hint that comes back as part of readdir().  C Git uses this
to avoid stats on directories, JGit needs to stat a directory to
determine if it is a directory (needs to recurse into it) or a file
(needs to process that file).  I know C Git benefits from this in a
tree like the kernel where there are many directories and it can avoid
doing stats on those.

Another disadvantage is Java probably needs to perform *3* stats for
each file path, while C Git only does one.  JGit needs to do:

  * stat to see if it is a directory
  * stat to obtain the length
  * stat to obtain the last modification time

This is due to the java.io.File API only offering us each of these
values as separate method calls, with no way to cache the results,
even though both POSIX and Win32 make these available through a single
stat type API to the kernel.  :-(

NIO2 in Java 7 might solve this problem.  But right now that isn't
available to us.  I keep thinking about doing an optional tiny JNI
layer for JGit that just offers us a handful of helper routines.
Exposing the basics (type, length, last modified time) of POSIX and
Win32 stat system calls is one of those.

> Remark 1: the body of the while loop only takes minor part of execution time
> (~1s)

This is very odd.  The body of the while loop should be the most time
consuming part.  Can you better isolate with manually inserted timers
where the rest of the time is going?

> Remark 2: we tried an FileTreeIterator instead of
> AdaptableFileTreeItereator for the workdir but this made things slower.

I think that you removed the Eclipse resource cache when you did that.
 EGit uses AdaptableFileTreeIterator to try and make use of the cached
resource stats that the workbench has for the files in the workspace,
so that we can try to avoid making 3 stat calls per path to the
operating system.  Forcing us to use FileTreeIterator meant we
bypassed that cache and had to ask the operating system about each
file.

-- 
Shawn.


Back to the top