|Re: [jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject|
On Thu, Feb 13, 2014 at 6:56 AM, Carlsson, Johannes > > 1) If I understand correct the TeeInputStream is created so that a loose > object is created that can later be used instead of reading the pack file > again. But in this case where only the size is wanted it instead creates > huge amount of overhead. Can't this be handled in a special case, e.g. the > TeeInputStream set up first when a read is performed? Oops. This was clearly a mistake. If all the caller wanted was the size we shouldn't have done this. > 2) It takes crazy long time to create this loose object. You can see a > objects/noz2787230184080961842.tmp that grows very slowly, the largest file > I have there is 86M and that took > 12h on a decent machine. Can't this be > improved? I don't know how to do this better. The problem is the delta is seeking randomly around in the base. The base needs to be created in order to support seeking. Even if the base is a loose object it is compressed and needs to be inflated in order to find the relevant section. If the delta skips backwards again the inflater is closed and reopened and uncompresses forward until the relevant spot is found. Setting the streamFileThreshold to a larger size allows the base to be fully in memory as a contiguous byte array which is randomly accessible in constant time. This matches what git-core does. It uses a lot of memory, but is faster.
Back to the top