Re: [jgit-dev] Problem when trying to do diff on large files stored in L

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Thu, 13 Feb 2014 07:03:50 -0800
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Thu, Feb 13, 2014 at 6:56 AM, Carlsson, Johannes >
> 1) If I understand correct the TeeInputStream is created so that a loose
> object is created that can later be used instead of reading the pack file
> again. But in this case where only the size is wanted it instead creates
> huge amount of overhead. Can't this be handled in a special case, e.g. the
> TeeInputStream set up first when a read is performed?

Oops. This was clearly a mistake. If all the caller wanted was the
size we shouldn't have done this.

> 2) It takes crazy long time to create this loose object. You can see a
> objects/noz2787230184080961842.tmp that grows very slowly, the largest file
> I have there is 86M and that took > 12h on a decent machine. Can't this be
> improved?

I don't know how to do this better. The problem is the delta is
seeking randomly around in the base. The base needs to be created in
order to support seeking. Even if the base is a loose object it is
compressed and needs to be inflated in order to find the relevant
section. If the delta skips backwards again the inflater is closed and
reopened and uncompresses forward until the relevant spot is found.

Setting the streamFileThreshold to a larger size allows the base to be
fully in memory as a contiguous byte array which is randomly
accessible in constant time. This matches what git-core does. It uses
a lot of memory, but is faster.

Follow-Ups:
- Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
  - From: Carlsson, Johannes

References:
- [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
  - From: Carlsson, Johannes

Prev by Date: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
Next by Date: Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
Previous by thread: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
Next by thread: Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject
Index(es):
- Date
- Thread

Breadcrumbs