Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] listener for porcelain/plumbing commands?

On Wed, Jul 24, 2013 at 4:02 PM, Christian Trutz
<christian.trutz@xxxxxxxxx> wrote:
> in the while loop of AddCommand#call() method I see that every file is read
> twice from hard disk:
>
> 1) long contentSize = f.getEntryContentLength();
>
> 2) InputStream in = f.openEntryStream();
>                                     try {
>                                         entry.setObjectId(inserter.insert(
>                                                 Constants.OBJ_BLOB,
> contentSize, in));
>                                     } finally {
>                                         in.close();
>                                     }
>
> f.getEntryContentLength() not only returns the content length but also read
> the file into a 64K array. Is the contentSize really needed in the
> inserter.insert(...) method?

Yes. The content length must be known in advance in order to compute
the correct SHA-1 that Git uses internally for storage. The reason
getEntryContentLength() reads the file is to handle CRLF conversions.
If there are no conversions to be applied (typical default behavior is
no conversions) than the obvious fix is to have
getEntryContentLength() just return java.io.File.length() result here.
Unfortunately that is not how this is implemented right now. *sigh*

Another bottleneck for adding many files at once is the fsync() that
happens on every file add. JGit disables this by default but you can
enable the fsync for even slower performance by setting
core.fsyncobjectfiles to true. I'm guessing you didn't do this.

When adding more than 1000 files its probably better to use a pack.
Unfortunately I don't think JGit has a good API to use for batch
adding lots of objects into a single pack stream. This should be
hidden inside of ObjectInserter as an implementation detail... and its
not even available. In normal command line Git world we recommend to
use git fast-import to write the blobs out in bulk.


Back to the top