Re: [jgit-dev] large object support patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] large object support patches

From: Matthias Sohn <matthias.sohn@xxxxxxxxxxxxxx>
Date: Fri, 2 Jul 2010 17:01:50 +0200
Delivered-to: jgit-dev@xxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=SIxh9owDhL0Oyzrbs3FO9LI5LoRFq9FvqzcPXwfaesh2K3UaNQ1NbIEA4rjTsl6E6a 4n1TXEQIKYHu9b0LKmPLopjUtP6Vcj9BXdKscxceSHhS5cLT/+C87FHVoHa4AnG2xLhS m7g2z/Xqc7g13kvQYDkGnE5X8ldVYcpoa3VTk=
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

2010/7/2 Shawn Pearce <spearce@xxxxxxxxxxx>

I think I have finished implementing the read side of large object
support. IndexPack is still being very bad and allocating the entire
object as a byte[], which means cloning, fetching or receiving large
objects bigger than your JVM heap still fails with spectacular
results. I plan to rework all of that as I try to abstract IndexPack
away from the local filesystem and turn it into a slightly more
generic pack stream parser. So it will be fixed soon.

The large object patches are:

http://egit.eclipse.org/r/1032 -- loose objects
http://egit.eclipse.org/r/1034 -- whole packed objects
http://egit.eclipse.org/r/1035 -- delta packed objects

The last one, delta packed objects, is a nightmare of a patch. Doing
streaming deltas efficiently is virtually impossible. I've tried to
get close. I think the challenge now is to find the proper size of
LARGE_OBJECT such that the majority of reasonably sized files stored
in Git can be processed without going through that nasty code path.
Right now it is hard coded to 1 MiB, but I suspect we might want to
consider something more like 10 MiB. Keep in mind that we need over
2*LARGE_OBJECT in memory at once, as both the delta instruction stream
and the base object it applies onto can each be that size. So using
the fast path for files < 10 MiB requires up to around 20 MiB of JVM
heap.

Throughout the series I have tried to avoid a negative performance
impact for small files, so anything under LARGE_OBJECT should run just
as fast as it did before this series goes in. The loose object code
path is now slower to execute, but its perhaps easier to follow than a
more optimized version would be. I tried having a custom
implementation for objects whose compressed form was under 16 KiB and
whose header said their inflated size was under LARGE_OBJECT, but that
made the code a lot more complex due to so many redundant branches, so
I punted on that optimization for now.

Thoughts? I know SAP was interested in this sort of support being
added to JGit during our very first phone conversation. I'm happy
that its now working. :-)

Great, that's awesome. I will look into these changes soon.

--
Matthias

References:
- [jgit-dev] large object support patches
  - From: Shawn Pearce

Prev by Date: [jgit-dev] JGit's license starts in the middle of the sentence - should this be fixed?
Next by Date: Re: [jgit-dev] JGit's license starts in the middle of the sentence - should this be fixed?
Previous by thread: [jgit-dev] large object support patches
Next by thread: [jgit-dev] Re: large object support patches
Index(es):
- Date
- Thread

Breadcrumbs