Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Avoiding stat calls (again)

On Thu, Nov 18, 2010 at 11:56 AM, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Tue, Nov 16, 2010 at 2:57 PM, Robin Rosenberg
> <robin.rosenberg@xxxxxxxxxx> wrote:
>> See the Gerrit http://egit.eclipse.org/r/#change,1905. The main thing I wanted, was to make the Commit
>> dialog pop up faster. Now, what I found is that it actually opens and read every file, No wonder it
>> takes forever. Git is supposed to be fast.
>>
>> Daemon Thread [ModalContext] (Suspended)
>>        ContainerTreeIterator(WorkingTreeIterator).idBuffer() line: 213
>>        ContainerTreeIterator(AbstractTreeIterator).idEqual(AbstractTreeIterator) line: 368
>>        TreeWalk.idEqual(int, int) line: 696
>>        TreeFilter$2.include(TreeWalk) line: 133
>
> Ouch.  Someone is using the default ANY_DIFF with an iterator that
> cannot provide fast access to the ObjectId of the current entry (aka a
> WorkingTreeIterator).  No wonder its taking ages, we have to SHA-1
> checksum the entire working directory.  That's a bad use of the
> ANY_DIFF filter.
...
> Idea 2)  Add knowledge of the TreeWalk and the position of the
> DirCacheIterator to the WorkingTreeIterator.  That is, maybe we add a
> method like:
>
>  public void setDirCacheIterator(TreeWalk tw, int pos);

This is proposed in http://egit.eclipse.org/r/1927

But, I'm yet not convinced we should commit this.

If we rely on only this change, ANY_DIFF will still call idBuffer()
for working files that are modified, because they don't match the
DirCacheIterator.  If the files are sufficiently large, JGit will
still crawl because idBuffer() is spending a lot of time to compute a
SHA-1 checksum that doesn't matter.

I never meant for TreeFilter.ANY_DIFF to be used with a
WorkingTreeIterator.  It can be used... but its performance results
are disastrous because it relies solely on idBuffer(), and idBuffer()
is an inherently slow operation for working tree files.  I intended to
use something like the WorkingTreeIterator's isModified() method,
through a different TreeFilter implementation.  The earlier rewrite of
the resource decorator in EGit was the first such example (and
predates the isModified method by a lot).

It may simply be the case that the offending application code should
stop using ANY_DIFF when there is a WorkingTreeIterator involved, and
instead use a different TreeFilter implementation that can more
effectively take advantage of the isModified() method.  This filter
probably should be defined as a stock filter in JGit, and take the
various relevant indexes as constructor parameters.

An alternative approach is to add some sort of "compare" method to
AbstractTreeIterator that TreeWalk can use, so that instead of
idEqual(int,int) we have entryEqual(int, int) that delegates down into
that compare routine.  Then WorkingTreeIterator and DirCacheIterator
can override the compare routine to try and special case a compare of
those two iterator types, relying on isModified() instead of
idBuffer().  This does get ugly implementation wise, as both iterator
types need to override the method with identical delegation to a
common compare routine.  If we ever grew additional iterator types,
the permutations spread out through the various implementations quite
quickly, making for a very bug-prone system.

-- 
Shawn.


Back to the top