Re: [jgit-dev] File insert performance related to DirCacheTree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] File insert performance related to DirCacheTree

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Tue, 14 Apr 2015 20:58:28 -0700
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Tue, Apr 14, 2015 at 4:12 AM, Philipp Marx <smigfu@xxxxxxxxxxxxxx> wrote:
> Hi,
>
> I have a question around inserting new files with JGit. I insert files into
> the repository via a DirCacheEditor (similar to this snippet):
>
> final ObjectId objectId = objectInserter.insert(OBJ_BLOB, object);
>
> dirCacheEditor.add(new PathEdit(path.toString())...
>
> dirCacheEditor.finish();
>
> dirCache.writeTree(objectInserter);
>
>
> The DirCache I retrieve the editor from is already fully initialised via a
> builder. What I can see now in my ObjectInserter is that the DirCacheTree
> belonging to the DirCache is calling itself recursively for all trees
> contained in the directory (despite if they had changed or not) and calling
> the insert method on the ObjectInserter (delegated through the
> TreeFormatter). This has a huge performance impact on my application, since
> I have many entries and the default use case is that only one of this
> entries is modified. I can see that the existing CacheTree (which already
> contains the id's of the trees) is removed once I call
> dirCacheEditor.finish().

Yes. I think this is expected behavior. The CacheTree code is
incomplete in JGit so it just gets invalidated during an most updates
to the DirCache.

> So my question is whether there is another approach which would scale better
> for my use case? Like reusing the tree ids which are already known and only
> insert those which have changed. Or is there something I am doing completely
> wrong anyway :-)

Is the ObjectInserter actually writing the tree object? Usually for a
tree the tree will get formatted with a TreeFormatter, generate an
ObjectId, the inserter looks for that, finds its already present in
the repo, and skips writing the object.

Or is the bottleneck just the TreeFormatter trying to build the tree
in a temporary byte array and making the ObjectId hash from it?

Follow-Ups:
- Re: [jgit-dev] File insert performance related to DirCacheTree
  - From: Philipp Marx

References:
- [jgit-dev] File insert performance related to DirCacheTree
  - From: Philipp Marx

Prev by Date: [jgit-dev] File insert performance related to DirCacheTree
Next by Date: Re: [jgit-dev] File insert performance related to DirCacheTree
Previous by thread: [jgit-dev] File insert performance related to DirCacheTree
Next by thread: Re: [jgit-dev] File insert performance related to DirCacheTree
Index(es):
- Date
- Thread

Breadcrumbs