Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Writing directory list method using JGit

On Thu, Aug 11, 2011 at 05:11, ilya.ivanov@xxxxxxxxxxx
<ilya.ivanov@xxxxxxxxxxx> wrote:
> Sorry, if it's not a correct place for the question.
>
> I'm writing method that should list children of one tree in git repo. For
> each entry (file/dir) it must get info about last changed version, author,
> modification date (like github repo browser shows).

This is *not* easy to do with Git. If you notice the GitHub repository
browser shows spinning icons next to each file while it computes this
data. They have huge cache servers that hold this information cached
once computed, but until its been requested and computed (which does
take a long time) they show those spinning icons to amuse the user.

You have to do roughly what you describe... and this is O(N) in terms
of history. If you want to find P paths, its O(P * N) in time. This is
not cheap when P and N are large, its quadratic.

> My implementation does the following:
>
> 1. Gets list of directory children (using TreeWalk). I will call this list
> <interesting paths>
>
> 2. Creating RevWalk. Traversing through commits and searching where each
> file was last changed.
> For every commit returned by RevWalk I create a TreeWalk for two trees: from
> current commit and from previous commit. This allows to find paths changed
> in current commit.
> If I find that one of <interesting paths> is changed between two commits I
> take commit info for that path.

There isn't really a better way to do this.

> Could you suggest some optimization for that? What can be the bottleneck in
> this implementation? This algorithm must be as fast as possible.
> I think comparing paths in tree traversal takes lot of time, but have no
> idea how to improve it.

Yes, the compare of paths is expensive, but if you have a
PathFilterGroup its about as good as it can be. The PathFilterGroup
sorts the paths and looks at them in order. IIRC it might not be
completely optimal, early paths like "A" are still examined when the
TreeWalk is sitting on "S". Unfortunately the PathFilterGroup doesn't
participate in the merge-join loop that the TreeWalk performs over the
trees.

-- 
Shawn.


Back to the top