Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[jgit-dev] Re: jgit clone performance problems

Constantine Plotnikov <constantine.plotnikov@xxxxxxxxx> wrote:
> I was recently profiling why cloning with jgit is so slow. One of
> major offenders (on Windows) is creation of file input stream through
> the following method in ObjectDirectory.
> 
> 	@Override
> 	protected ObjectLoader openObject2(final WindowCursor curs,
> 			final String objectName, final AnyObjectId objectId)
> 			throws IOException {
> 		try {
> 			return new UnpackedObjectLoader(fileFor(objectName), objectId);
> 		} catch (FileNotFoundException noFile) {
> 			return null;
> 		}
> 	}
> 
> During clone with the pack based transport the exception always
> thrown.

I don't get it.  openObject2 should only be invoked if the object
was not found in a pack file during openObject1.

In the case of packing to service a client, we shouldn't be trying
to open a loose object unless we can't find it in any pack file.
So we never should be getting a FileNotFoundException here.

> I'm planning to implement the following fix for this problem.
> 
> Introduce the following class:
> 
> class UpackedObjectLoaderPolicy {
>     static final UpackedObjectLoaderPolicy  DEFAULT = new
> UpackedObjectLoaderPolicy() {
>         protected abstract ObjectLoader loadUnpackedObject(File file,
> objectId) {
> 		try {
> 			return new UnpackedObjectLoader(file, objectId);
> 		} catch (FileNotFoundException noFile) {
> 			return null;
> 		}
>         }
>     }
> 
>     protected abstract ObjectLoader loadUnpackedObject(File file, objectId);
> }
> 
> There will be property of this type on the ObjectDirectory. The
> default value will be default. The property will be used in
> implementation of openObject2 instead of the current implementation.
> 
> Create class UpackedObjectLoaderPolicyStable that does the following:
> 1. If directory for the requested file was never tried before, it is
> checked for existence and result is remembered along with set of files
> in the directory.
> 2. If the file exists in the directory, the behavior equivalent to the
> behavior of the default policy
> 3. If directory of file does not exists, the null is immediately returned.

How does this UpackedObjectLoaderPolicyStable adjust to the updated
objects directory when an external process writes new loose objects
into the directory?


> During indexing, the previous value of UpackedObjectLoaderPolicy
> property is remembered, the new instance the "stable" policy is used.
> After indexing finishes, the previous value is restored. Rationale for
> such policy is that new unpacked objects will unlikely be created
> while pack indexing is in progress.

Again, I think something else is wrong, we shouldn't be hitting this
code path unless we're incredibly certain that the object must be
a loose object.

 
> Possible alternatives for "stable" policy:
> a. Storing only directories. It would work well for the clone case
> (there will be no directories for unpacked objects, but will not work
> well for fetch with working tree, since objects will be likely present
> in the directory
> b. Using Bloom Filter to store file set (to get a 1/16 false positive
> rate we need about 2 ^ ceil(log2(number of files)+3) bits per file,
> for 10000 files it will be about 13 bits for per file, but git
> repository will be almost unusable with such number of unpacked
> objects per directory anyway).

-- 
Shawn.


Back to the top