|Re: [jgit-dev] Question on large object streams|
On Mon, Oct 4, 2010 at 10:17 AM, Dmitry Neverov <dmitry.neverov@xxxxxxxxx> wrote: > Why reading from large object streams is so slow? It depends on how the large object is stored. :-) If its in a pack file, and is stored as a delta to another object, its so slow that its unusable. If its stored whole in the pack (not as a delta), or is a loose object, its performance is acceptable. Its slower than the fast path, but its still something that a user won't mind waiting for. > We have a file of size ~ 10Mb and reading its content from ObjectStream > takes forever. > I see a file 'noz5208794269214828797.tmp' in .git dir, it's size grows > slowly (~2Mb per hour). > And whenever I pause execution I saw this in stack trace: > > main@1, prio=5, in group 'main', status: 'runnable' > java.lang.Thread.State: RUNNABLE > locked <0xb05> (a java.util.zip.Inflater) > locked <0x94b> (a java.io.BufferedInputStream) > locked <0x603> (a jetbrains.buildServer.vcs.patches.PatchBuilderImpl) > at java.util.zip.Inflater.inflateBytes(Inflater.java:-1) > at java.util.zip.Inflater.inflate(Inflater.java:215) > at > java.util.zip.InflaterInputStream.read(InflaterInputStream.java:128) > at > org.eclipse.jgit.storage.pack.DeltaStream.fill(DeltaStream.java:263) RIght, this object is a delta in a pack file. What's happening here is you are deflating the base object into a temporary file, and then doing random seeks on that temporary file in order to apply the delta. If the delta is at the end of a delta chain that is say 15 objects long, you need to do this 15 times before you can get to the data for the requested object. That can be a lot of work. What version of JGit is this? Tip of master should be inflating these objects into loose objects in the loose objects directory, such that subsequent access is faster because its just streaming from the loose object rather than the packed form. But its still slow for the initial read. :-( One thing we should do is teach IndexPack about this and have it cache the large delta object as a loose object immediately during fetch/clone so that during checkout we have fast access to that content. But I hadn't thought about doing that until just now, so whatever. > How can we speed it up? Increase the core.streamFileThreshold in your WindowCacheConfig to a value larger than the default. Right now the default is 5 MiB. But I thought I had patches queued on Gerrit to increase this to 50 MiB. You can also use the -delta gitattribute when you pack the repository to try and keep these "large" files from being delta compressed. The resulting pack will be bigger, but JGit will perform better when accessing it because the bigger objects can be directly streamed. -- Shawn.
Back to the top