Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] File insert performance related to DirCacheTree

Hi Shawn,

the trees are not actually written, that check just works fine. But the recalculation of all ObjectIds of the trees is the bottleneck for me. I have lots of them and updates (usually only one object) are rather frequent in my app.

Do you think it would be safe to remember the changed PathEdits in the DirCacheEditor and modify DirCache#replace (keep the existing DirCacheTree instead of nulling out) and DirCacheTree#writeTree to only recalculate those which are affected by the edits?

Thanks,
Philipp

2015-04-15 5:58 GMT+02:00 Shawn Pearce <spearce@xxxxxxxxxxx>:
On Tue, Apr 14, 2015 at 4:12 AM, Philipp Marx <smigfu@xxxxxxxxxxxxxx> wrote:
> Hi,
>
> I have a question around inserting new files with JGit. I insert files into
> the repository via a DirCacheEditor (similar to this snippet):
>
> final ObjectId objectId = objectInserter.insert(OBJ_BLOB, object);
>
> dirCacheEditor.add(new PathEdit(path.toString())...
>
> dirCacheEditor.finish();
>
> dirCache.writeTree(objectInserter);
>
>
> The DirCache I retrieve the editor from is already fully initialised via a
> builder. What I can see now in my ObjectInserter is that the DirCacheTree
> belonging to the DirCache is calling itself recursively for all trees
> contained in the directory (despite if they had changed or not) and calling
> the insert method on the ObjectInserter (delegated through the
> TreeFormatter). This has a huge performance impact on my application, since
> I have many entries and the default use case is that only one of this
> entries is modified. I can see that the existing CacheTree (which already
> contains the id's of the trees) is removed once I call
> dirCacheEditor.finish().

Yes. I think this is expected behavior. The CacheTree code is
incomplete in JGit so it just gets invalidated during an most updates
to the DirCache.

> So my question is whether there is another approach which would scale better
> for my use case? Like reusing the tree ids which are already known and only
> insert those which have changed. Or is there something I am doing completely
> wrong anyway :-)

Is the ObjectInserter actually writing the tree object? Usually for a
tree the tree will get formatted with a TreeFormatter, generate an
ObjectId, the inserter looks for that, finds its already present in
the repo, and skips writing the object.

Or is the bottleneck just the TreeFormatter trying to build the tree
in a temporary byte array and making the ObjectId hash from it?


Back to the top