|Re: [jgit-dev] Performance of commit preparation|
On Wed, Oct 20, 2010 at 8:15 AM, Baumgart, Jens <jens.baumgart@xxxxxxx> wrote: > The tree walk over repo, workdir and index takes 23 seconds (with empty > while loop) for our test repo (linux kernel). > Git commit from native git takes 7 seconds. Is this a realistic relation > from C to Java or is there improvement left in the tree walk that calculates > the changes? Implementation looks as follows: LIke Robin said in his message, can you trace and find out what we are doing stats on, and what we are opening and reading content from? One of the disadvantages we have in Java is we cannot get access to the type hint that comes back as part of readdir(). C Git uses this to avoid stats on directories, JGit needs to stat a directory to determine if it is a directory (needs to recurse into it) or a file (needs to process that file). I know C Git benefits from this in a tree like the kernel where there are many directories and it can avoid doing stats on those. Another disadvantage is Java probably needs to perform *3* stats for each file path, while C Git only does one. JGit needs to do: * stat to see if it is a directory * stat to obtain the length * stat to obtain the last modification time This is due to the java.io.File API only offering us each of these values as separate method calls, with no way to cache the results, even though both POSIX and Win32 make these available through a single stat type API to the kernel. :-( NIO2 in Java 7 might solve this problem. But right now that isn't available to us. I keep thinking about doing an optional tiny JNI layer for JGit that just offers us a handful of helper routines. Exposing the basics (type, length, last modified time) of POSIX and Win32 stat system calls is one of those. > Remark 1: the body of the while loop only takes minor part of execution time > (~1s) This is very odd. The body of the while loop should be the most time consuming part. Can you better isolate with manually inserted timers where the rest of the time is going? > Remark 2: we tried an FileTreeIterator instead of > AdaptableFileTreeItereator for the workdir but this made things slower. I think that you removed the Eclipse resource cache when you did that. EGit uses AdaptableFileTreeIterator to try and make use of the cached resource stats that the workbench has for the files in the workspace, so that we can try to avoid making 3 stat calls per path to the operating system. Forcing us to use FileTreeIterator meant we bypassed that cache and had to ask the operating system about each file. -- Shawn.
Back to the top