Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] RevSort#COMMIT_TIME_DESC confusion

On Fri, Aug 12, 2011 at 01:55, Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
> I was relying on RevSort#COMMIT_TIME_DESC to report RevCommits always in
> descending order, however this is not the case for certain repositories,
> i.e. the commit time of a parent may be more recent than the entry's
> commit time itself (does anyone know why that can happen?). IMHO this
> behavior should be documented in the javadocs.

The sorting is not an absolute result. Giving you an absolute sorted
result means you have to wait up to a full minute on the linux-2.6
repository before the first commit can be returned, as the entire
project history must be loaded into memory and sorted. This is simply
too expensive to perform most of the time. So COMMIT_TIME_DESC
approximates by running everything through a priority queue sorted by
commit time, descending. But when there is clock skew across commits,
yes, they can arrive out of order in the result.

If you really want all commits sorted by time, you need to run the
RevWalk until you have buffered all results in your own data
structure, then sort that. Its the only way to eliminate the clock
skew. But its so expensive to perform that nobody does this.

On Fri, Aug 12, 2011 at 05:04, Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
>> Could it be a cross-timezone issue, if local time is used instead of UTC?
>
> Looks like you are right, though I don't understand the reasons of this
> effect. These are the offending commits (taken from the IDEA community
> repository):
>
> commit d4f3d4c655295e2b1cf1d90374f8b8e18fdc3dac
> tree 02d25f4ce8b2b45ef5e8af6fadbd1d328cb16f22
> parent 676abb30545bf63409ab061b2fdcd021736896be
> author Sergey Evdokimov <sergey.evdokimov@xxxxxxxxxxxxx> 1300978514 +0300
> committer Sergey Evdokimov <sergey.evdokimov@xxxxxxxxxxxxx> 1301056269 +0300
>
>    Add method 'toString()' to IntArrayList.
>
> commit 676abb30545bf63409ab061b2fdcd021736896be
> tree d93631c534c20a088cb2e4fb5a5c6c2dfea108ac
> parent bea282d766d21e636752d0f50d603f23e4f868f3
> author peter <peter@xxxxxxxxxxxxx> 1301056144 +0100
> committer peter <peter@xxxxxxxxxxxxx> 1301056346 +0100
>
>    once the first calculation is finished, don't move the lookup
>
> 1301056269 is slightly before 1301056346, whereas order according to
> local time would be correct. So does RevSort#COMMIT_TIME_DESC assert
> correct order on local time?

This is *not* a timezone problem. The times are stored in UTC. The
timezone next to them are advisory, so you can format the local time
of the committer and know if it was 3 AM in their timezone, or 3 PM
when they wrote that commit. Since the parent has a newer commit
timestamp, this is clock skew. The systems that created these commits
have very different settings on their system clocks. Its a common
problem in distributed systems.

There have been a number of discussions on the Git mailing list about
clock skew in commits. Clock skew happens. Git tries to deal with some
clock skew by having a slop bucket as it traverses the history, but
sometimes the clocks are just too far off and some optimizations do
break.

> What I'm looking for is a quick and reliable check whether a certain
> TARGET commit is reachable from another SRC commit. Currently I'm doing
> a RevWalk with RevSort#COMMIT_TIME_DESC starting at SRC and stopping
> once I either encounter TARGET or another commit X with commit-time(X) <
> commit-time(TARGET).

Don't do this. Instead use RevWalk.isMergedInto(TARGET, SRC). The
algorithm doesn't terminate until it finds the merge base between the
two branches, which means it works even when there is clock skew.

> Now, according to upper example, that doesn't work
> correctly. If it's about timezones, I could run until commit-time(X) <
> commit-time(TARGET - 24 hours). However, I'm wondering if order of
> commit-times are reliable at all? Can they arbitrarily jump back and
> force or are there some restrictions on the order of timestamps in Git
> repositories?

They aren't reliable. They can be any value. And no current
implementation of Git enforces a rule like "commit time of descendent
must be >= commit time of parent". We discussed doing this on the Git
mailing list a week or two ago, but it hasn't been coded yet for any
implementation.

-- 
Shawn.


Back to the top