Re: [jgit-dev] Implementing shallow clones

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [jgit-dev] Implementing shallow clones

From: Matt Fischer <mattfischer84@xxxxxxxxx>
Date: Thu, 22 Jul 2010 21:41:29 -0500
Delivered-to: jgit-dev@xxxxxxxxxxx
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=QYBb3BQAh7b4PJvTgfzvQyqEJcFAsHKh/0LVTupKGCXNniReUdD/VepntvxPwjGnt8 iZ1z0RCXh5HF96PRAkFHPbuDt1C4PgGfWBC4jkG+jbS2IzsSarM7KS792WU37S+gZj3I w/gqbza7Qa6kCZxITRBUQXSClGTNRRW45ejUM=
List-archive: <https://dev.eclipse.org/mailman/private/jgit-dev>
List-help: <mailto:jgit-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/jgit-dev>, <mailto:jgit-dev-request@eclipse.org?subject=unsubscribe>

On Mon, Jul 19, 2010 at 9:55 PM, Shawn O. Pearce <spearce@xxxxxxxxxxx> wrote:
> Matt Fischer <mattfischer84@xxxxxxxxx> wrote:
>>
>> If I can get to the point of producing the set of commits at the
>> edges, the rest is pretty easy (I can just feed the parents of those
>> edge commits into the PackWriter as uninteresting commits to get it to
>> exclude the proper things.)
>
> That probably won't work.  If you tag the parents of the edges as
> uninteresting we'll exclude any blobs that the parent contains.
> If the commit we are actually sending in response to the deepen
> request also uses that blob, but the client doesn't yet have that
> blob, we'll send an incomplete pack.
>
> You probably need to do something more sophisticated, like force
> the edges to be parsed and then whack out their parent array so its
> an empty list.  Then the traversal won't follow into the parents,
> and thus stops at the boundary.  You also would need to mark the
> blobs we know the client already has (from its common have lines)
> as uninteresting.
>
How about adding a new flag to RevWalk called SHALLOW?  Then we could
add things like RevWalk.markShallow(), in a similar way to how
uninteresting works now.  The semantics of SHALLOW would be that
implies a boundary, but unlike UNINTERESTING it does not imply that
the remote side has the commit.  Therefore, the ObjectWalk would never
mark the blobs and trees as uninteresting, so they would make it into
the pack.  I don't know whether that's too special-case a thing to put
into RevWalk, but it seems like if/when jgit gets support for the
client side of shallow checkouts (manipulating .git/shallow and all
that), it might need this flag anyway.

>> Looking through the code, it looks to me
>> like generating this set would take logic that looks sort of like a
>> combination of the Topo sort and the Boundary generator, along with
>> the ability to tag each commit with an integer for its depth as we go
>> along.
>
> Probably true.  Only we really don't want to fatten out the RevCommit
> structure by default to add the depth tag.
>
> But you could create a custom subclass of ObjectWalk that overrides
> createCommit to be a subclass which does have the depth field.
> Then use this new subclass only for shallow clone enumeration
> support, and set the depth as the commits spool out of the next()
> method.  Since you extended the ObjectWalk you might be able to wedge
> something into the StartGenerator that sets up a new filter generator
> to assign the depths as they spool out of the TopoSortGenerator.
> It should be pretty easy, the depth is the shortest path to a commit
> so each commit just has to set its depth + 1 into its parents.
>
Wrapper class for RevCommit is fine.  I'm wondering whether it would
be ok to just add the logic to create them directly into RevWalk,
though.  It could be set up so that if the "we need depth info" flag
is set, it just creates those wrapper objects.  Then it can also use
that flag to enforce that it's using Topo sort (because the depth
tagging won't work correctly if they aren't in topo order.)  If that
gets put in place, then I can just implement a new filter that picks
out only commits of a specified depth, and that gives me my shallow
list.

>> If I were to add this logic, would it be correct to say that I
>> should be making a new generator which only returns commits of a
>> certain depth,
>
> I think so.  The TopoSortGenerator is incremental, so if you put
> the numbering generator into the pipeline you can also decide
> when to abort.  Though aborting here is a bit tricky, if you just
> want to stop a given depth you need to remove the parents from
> the underlying PendingGenerator's queue once you realize that the
> parents are impossibly deep.  You can't stop until the queue is
> empty, because you might be visting down one branch, need to cut
> it off, then go visit another branch that is still in the queue.
>
>> and somehow add parameters to the RevWalk to tell it
>> that I want to instantiate this generator?  If so, what is the proper
>> way to make this generator get the commits in Topo order, and do the
>> necessary tagging to compute the depth on each node?
>
> Maybe just do a "walker instanceof MySpecialWalker" inside of the
> StartGenerator right after the TopoSortGenerator is created?
>
> Like I was saying above, doing this probably requires having both the
> TopoSortGenerator here and having access to the PendingGenerator's
> queue so you can fully kill a branch when its gone to the needed
> depth.
>
> Actually, you can kill it by marking it UNINTERESTING.  That'll make
> it drop out of the queue because PendingGenerator skips over commits
> that are uninteresting.
>
Aborting the scan sounds kind of tricky--is it really necessary to do
that?  Walking the entire commit history is certainly inefficient
during a shallow checkout, but it's no more inefficient than it would
be if you were fetching the whole history, and it seems like that cost
is probably negligible compared to the time spent sending the data
across the wire.  If I just implement a filter to pump through the
whole tree and only return commits of a specified depth, would that be
that bad?

--Matt

Follow-Ups:
- Re: [jgit-dev] Implementing shallow clones
  - From: Matt Fischer

References:
- [jgit-dev] Implementing shallow clones
  - From: Matt Fischer
- Re: [jgit-dev] Implementing shallow clones
  - From: Shawn O. Pearce

Prev by Date: [jgit-dev] Merging my refactoring and delta series
Next by Date: Re: [jgit-dev] Pack files are locked
Previous by thread: Re: [jgit-dev] Implementing shallow clones
Next by thread: Re: [jgit-dev] Implementing shallow clones
Index(es):
- Date
- Thread

Breadcrumbs