|Re: [jgit-dev] Avoiding stat calls (again)|
On Thu, Nov 18, 2010 at 11:56 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote: > On Tue, Nov 16, 2010 at 2:57 PM, Robin Rosenberg > <robin.rosenberg@xxxxxxxxxx> wrote: >> See the Gerrit http://egit.eclipse.org/r/#change,1905. The main thing I wanted, was to make the Commit >> dialog pop up faster. Now, what I found is that it actually opens and read every file, No wonder it >> takes forever. Git is supposed to be fast. >> >> Daemon Thread [ModalContext] (Suspended) >> ContainerTreeIterator(WorkingTreeIterator).idBuffer() line: 213 >> ContainerTreeIterator(AbstractTreeIterator).idEqual(AbstractTreeIterator) line: 368 >> TreeWalk.idEqual(int, int) line: 696 >> TreeFilter$2.include(TreeWalk) line: 133 > > Ouch. Someone is using the default ANY_DIFF with an iterator that > cannot provide fast access to the ObjectId of the current entry (aka a > WorkingTreeIterator). No wonder its taking ages, we have to SHA-1 > checksum the entire working directory. That's a bad use of the > ANY_DIFF filter. ... > Idea 2) Add knowledge of the TreeWalk and the position of the > DirCacheIterator to the WorkingTreeIterator. That is, maybe we add a > method like: > > public void setDirCacheIterator(TreeWalk tw, int pos); This is proposed in http://egit.eclipse.org/r/1927 But, I'm yet not convinced we should commit this. If we rely on only this change, ANY_DIFF will still call idBuffer() for working files that are modified, because they don't match the DirCacheIterator. If the files are sufficiently large, JGit will still crawl because idBuffer() is spending a lot of time to compute a SHA-1 checksum that doesn't matter. I never meant for TreeFilter.ANY_DIFF to be used with a WorkingTreeIterator. It can be used... but its performance results are disastrous because it relies solely on idBuffer(), and idBuffer() is an inherently slow operation for working tree files. I intended to use something like the WorkingTreeIterator's isModified() method, through a different TreeFilter implementation. The earlier rewrite of the resource decorator in EGit was the first such example (and predates the isModified method by a lot). It may simply be the case that the offending application code should stop using ANY_DIFF when there is a WorkingTreeIterator involved, and instead use a different TreeFilter implementation that can more effectively take advantage of the isModified() method. This filter probably should be defined as a stock filter in JGit, and take the various relevant indexes as constructor parameters. An alternative approach is to add some sort of "compare" method to AbstractTreeIterator that TreeWalk can use, so that instead of idEqual(int,int) we have entryEqual(int, int) that delegates down into that compare routine. Then WorkingTreeIterator and DirCacheIterator can override the compare routine to try and special case a compare of those two iterator types, relying on isModified() instead of idBuffer(). This does get ugly implementation wise, as both iterator types need to override the method with identical delegation to a common compare routine. If we ever grew additional iterator types, the permutations spread out through the various implementations quite quickly, making for a very bug-prone system. -- Shawn.
Back to the top