Re: [jgit-dev] Question on large object streams

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] Question on large object streams

From: Shawn Pearce <spearce@xxxxxxxxxxx>
Date: Mon, 4 Oct 2010 11:13:03 -0700
Delivered-to: jgit-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Mon, Oct 4, 2010 at 10:17 AM, Dmitry Neverov
<dmitry.neverov@xxxxxxxxx> wrote:
> Why reading from large object streams is so slow?

It depends on how the large object is stored.  :-)

If its in a pack file, and is stored as a delta to another object, its
so slow that its unusable.  If its stored whole in the pack (not as a
delta), or is a loose object, its performance is acceptable.  Its
slower than the fast path, but its still something that a user won't
mind waiting for.

> We have a file of size ~ 10Mb and reading its content from ObjectStream
> takes forever.
> I see a file 'noz5208794269214828797.tmp' in .git dir, it's size grows
> slowly (~2Mb per hour).
> And whenever I pause execution I saw this in stack trace:
>
> main@1, prio=5, in group 'main', status: 'runnable'
>   java.lang.Thread.State: RUNNABLE
>      locked <0xb05> (a java.util.zip.Inflater)
>      locked <0x94b> (a java.io.BufferedInputStream)
>      locked <0x603> (a jetbrains.buildServer.vcs.patches.PatchBuilderImpl)
>       at java.util.zip.Inflater.inflateBytes(Inflater.java:-1)
>       at java.util.zip.Inflater.inflate(Inflater.java:215)
>       at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:128)
>       at
> org.eclipse.jgit.storage.pack.DeltaStream.fill(DeltaStream.java:263)

RIght, this object is a delta in a pack file.  What's happening here
is you are deflating the base object into a temporary file, and then
doing random seeks on that temporary file in order to apply the delta.
 If the delta is at the end of a delta chain that is say 15 objects
long, you need to do this 15 times before you can get to the data for
the requested object.  That can be a lot of work.

What version of JGit is this?  Tip of master should be inflating these
objects into loose objects in the loose objects directory, such that
subsequent access is faster because its just streaming from the loose
object rather than the packed form.  But its still slow for the
initial read.  :-(

One thing we should do is teach IndexPack about this and have it cache
the large delta object as a loose object immediately during
fetch/clone so that during checkout we have fast access to that
content.  But I hadn't thought about doing that until just now, so
whatever.

> How can we speed it up?

Increase the core.streamFileThreshold in your WindowCacheConfig to a
value larger than the default.  Right now the default is 5 MiB.  But I
thought I had patches queued on Gerrit to increase this to 50 MiB.

You can also use the -delta gitattribute when you pack the repository
to try and keep these "large" files from being delta compressed.  The
resulting pack will be bigger, but JGit will perform better when
accessing it because the bigger objects can be directly streamed.

-- 
Shawn.

Follow-Ups:
- Re: [jgit-dev] Question on large object streams
  - From: Dmitry Neverov

References:
- [jgit-dev] Question on large object streams
  - From: Dmitry Neverov

Prev by Date: Re: [jgit-dev] UnpackedObject and CorruptObjectException
Next by Date: Re: [jgit-dev] Question on large object streams
Previous by thread: [jgit-dev] Question on large object streams
Next by thread: Re: [jgit-dev] Question on large object streams
Index(es):
- Date
- Thread

Breadcrumbs