|[jgit-dev] Problem when trying to do diff on large files stored in LargePackedDeltaObject|
I have a huge git (15G) with a lot of pack files (71, largest 9.8GB). When I try to get a diff for some commits using jGit it takes very long time (I have not yet seen the end, but it has been running > 12h). I am using latest on master (same problem with earlier releases).
The file it “hangs” on is 353M and is located in the 9.8GB packfile, changing “streamFileThreashold” to 512M this seems to solve the problem, but that seems a bit aggressive.
When running the test in the debugger it seems that the problem begins like this:
DiffFormatter.open(DiffEntry$Side, DiffEntry) ->
Here a TeeInputStream is set up and a new ObjectStream.Filter which overrides close()
The code then continues in ObjectLoader.getCachedBytes() which does a in.getSize() which returns 369284420 which will cause a LargeObjectException.ExceedsLimit to be thrown and the finally code to execute close() that were overridden above. This will in turn call close on the TeeInputStream which will read its and write to the DeflaterOutputStream util it is done.
I have two concerns about this:
1) If I understand correct the TeeInputStream is created so that a loose object is created that can later be used instead of reading the pack file again. But in this case where only the size is wanted it instead creates huge amount of overhead. Can’t this be handled in a special case, e.g. the TeeInputStream set up first when a read is performed?
2) It takes crazy long time to create this loose object. You can see a objects/noz2787230184080961842.tmp that grows very slowly, the largest file I have there is 86M and that took > 12h on a decent machine. Can’t this be improved?