[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject

On Thu, Feb 13, 2014 at 6:56 AM, Carlsson, Johannes >
> 1) If I understand correct the TeeInputStream is created so that a loose
> object is created that can later be used instead of reading the pack file
> again. But in this case where only the size is wanted it instead creates
> huge amount of overhead. Can't this be handled in a special case, e.g. the
> TeeInputStream set up first when a read is performed?

Oops. This was clearly a mistake. If all the caller wanted was the
size we shouldn't have done this.

> 2) It takes crazy long time to create this loose object. You can see a
> objects/noz2787230184080961842.tmp that grows very slowly, the largest file
> I have there is 86M and that took > 12h on a decent machine. Can't this be
> improved?

I don't know how to do this better. The problem is the delta is
seeking randomly around in the base. The base needs to be created in
order to support seeking. Even if the base is a loose object it is
compressed and needs to be inflated in order to find the relevant
section. If the delta skips backwards again the inflater is closed and
reopened and uncompresses forward until the relevant spot is found.

Setting the streamFileThreshold to a larger size allows the base to be
fully in memory as a contiguous byte array which is randomly
accessible in constant time. This matches what git-core does. It uses
a lot of memory, but is faster.