Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Problems with bitmaps and cloning and DFS back end

On Wed, Jun 19, 2013 at 12:02 PM, Alex Blewitt <alex.blewitt@xxxxxxxxx> wrote:
> On 19 Jun 2013, at 17:03, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
>> But my original idea for DFS backend was to encode the DfsPackDescription
>> data into the filename, and parse it back out when listing the packs
>> of the repository. Unfortunately this might not be possible if you
>> cannot rename a file, as some of the data arrives too late.
>
> Yes, I found that out too :-)
>
> However I found when the commitPack is called the info is there, so I write out a surrogate file with the data encoded in it.

Yes, our implementation does the same thing. The description data is
frozen just before commitPack() is invoked, so this is the point to
save it.

> The only real question is how much is needed; this is the first time I've needed anything at currently it works with only objectCount set.

Colby pointed out to me this morning that the object count is
necessary, and can be obtained from the PackIndex.

Getting it from there is not trivial. The DfsPackDescription coming
out of listPacks() needs the count set. The only way to get the object
count from the index is to load the entire index and call
getObjectCount(). Unfortunately the PackIndex you get from this
process won't be cached properly for the DfsPackFile to use it, so now
you have the index being loaded twice. Yuck.


I think a reasonable fix in JGit is to modify DfsCachedPack so that it
uses the DfsPackFile to get the object count, rather than the
DfsPackDescription. The count is only used once we have chosen to
reuse the pack, and reuse required us to look at the bitmap index,
which required us to load the PackIndex. So DfsPackFile can safely
delegate to the cached PackIndex and get a fast answer from memory.

> I imagine that fileSize and lastModified would be useful to enable the correct priority sorting to work

Yes, but this is overridable in your DfsPackDescription subclass by
replacing compareTo(DfsPackDescription).

> but I'm not sure whether the statistics are relevant outside of a DfsGarbageCollect result. I'm not sure about deltaCount.

The deltaCount can be 0. Its only used in a stats line shown to a
client when they do a clone and the bitmap file kicked in a reuse of
the entire pack. This can be safely 0, it just yields a possibly
confusing line.

> The question is what subset is strictly necessary? Perhaps unnecessary fields (if any) could be annotated with "transient".

Documenting this is a fantastic idea.


Back to the top