Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Performance of indexing blob metadata

On Sun, Mar 11, 2012 at 09:02,  <james.moger@xxxxxxxxxxx> wrote:
> I wonder if the core Git team has ever considered freshening the repository
> format to track a little more info on commits and blobs so that metadata
> reconstruction is not so painful?  Such a change would probably be
> non-backwards compatible.

Yes, and no it won't happen.

The repository format stores the minimal amount of information
required to accurately record the project history and contents, and do
so in a reasonably secure way that tampering or data corruption is
detectable. Everything else can be derived from the repository format,
and the general advice to application developers is to build your own
caching infrastructure around the repository format. You know a SHA-1
is immutable and so is its history, so caching results with commit
SHA-1s as keys tends to work well. Each application wants different
data, or wants it in different formats, so its better if the
application author chooses how to store their caches and what
information to store.

Heck, even Git itself uses brute force methods in some places. The
wire protocol only sends the object data... no SHA-1s. The client (or
server) receiving the data has to recompute the SHA-1s itself. For a
large project with a lot of objects, this can take several minutes.


Back to the top