Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject

On Thu, Feb 13, 2014 at 4:19 PM, Carlsson, Johannes <Johannes.Carlsson.x@xxxxxxxxxxxxxx> wrote:

>>1) If I understand correct the TeeInputStream is created so that a
>> loose object is created that can later be used instead of reading the
>> pack file again. But in this case where only the size is wanted it
>> instead creates huge amount of overhead. Can't this be handled in a
>> special case, e.g. the TeeInputStream set up first when a read is performed?
>Oops. This was clearly a mistake. If all the caller wanted was the size we
>shouldn't have done this.

I am looking forward to a fix :). Should I open a bug for this?

yes, please file a bugĀ 

>> 2) It takes crazy long time to create this loose object. You can see a
>> objects/noz2787230184080961842.tmp that grows very slowly, the largest
>> file I have there is 86M and that took > 12h on a decent machine.
>> Can't this be improved?

>I don't know how to do this better. The problem is the delta is seeking
>randomly around in the base. The base needs to be created in order to
>support seeking. Even if the base is a loose object it is compressed
>and needs to be inflated in order to find the relevant section. If the
>delta skips backwards again the inflater is closed and reopened and
>uncompresses forward until the relevant spot is found.

>Setting the streamFileThreshold to a larger size allows the base to be
>fully in memory as a contiguous byte array which is randomly accessible
>in constant time. This matches what git-core does. It uses a lot of
>memory, but is faster.

I was afraid you would say this :)

If there is a fix for 1) and my application don't touch big files (other
than looking at the size) I assume that I can still get away with a
low "streamFileThreshold".


Back to the top