Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Re: Understanding Racy Git Handling in JGit: persisting smudged entries

2010/7/14 Shawn O. Pearce <spearce@xxxxxxxxxxx>
Christian Halstrick <christian.halstrick@xxxxxxxxx> wrote:
> why does JGit smudge the modification time of index entries
> differently than native git? Native git truncates the cached length to
> 0, while JGit smudges the timestamp to a maximum value. Although we
> don't see any problem with that we are wondering why this has been
> done. We found the following comment in
> DirCacheEntry.smudgeRacilyClean() :
>
>               // We don't use the same approach as C Git to smudge the entry,
>               // as we cannot compare the working tree file to our SHA-1 and
>               // thus cannot use the "size to 0" trick without accidentally
>               // thinking a zero length file is clean.
>               //
>               // Instead we force the mtime to the largest possible value, so
>               // it is certainly after the index's own modification time and
>               // on a future read will cause mightBeRacilyClean to say "yes!".
>               // It is also unlikely to match with the working tree file.
>               //
>               // I'll see you again before Jan 19, 2038, 03:14:07 AM GMT.
>
> We don't understand fully that comment. Additionally: shouldn't it be
> very cheap to detect whether a zero-length file is clean or not? The
> SHA-1 for an empty file should be constant?

Sorry the comment wasn't clear enough.

The problem I ran into is, DirCache and DirCacheEntry don't
necessarily have access to the working tree in order to examine
the file itself.  If we set the size to 0 during smudge, we can't
verify that the file length is not 0 before we smudge.

But I think what you are trying to get to here is, can't we just
set the size to 0 to smudge the record and worry about it later?


If the file is non-empty and we smudge the length to 0, during a
future scan the length won't match, and we can recheck its contents
to repair the cached stat data.

If the file is empty, the cached stat data now matches, and we'll
consider it clean.  That leave us open to missing an update where a
user truncates a file in the same filesystem step as the index was
changed.  But the cached ObjectId won't be the empty blob ObjectId,
which is easy to check for zero length files.


OK, I think I get what you are saying.

- We change our smudge logic to set the length to 0.

- During a status check of a file, we have two special cases:

 if (dirCacheEntry.length == 0) {
   if (dirCacheEntry.objectId != emptyBlobObjectId) {
     // This is a racily clean smudged entry.  It can be repaired by
     // checking the file's contents on disk to see if they match the
     // dirCacheEntry.objectId.  If they do, we can correct length and
     // last modified to unsmudge it.
     //

   } else (file.length == 0) {
     // This might have been racily clean, we don't know, but
     // it doesn't actually matter.
     //
   }
 }

Right?

Yeah, that's what we thought could replace the current implementation.
But we weren't sure if we don't overlook some aspect. Now with your 
rephrased explanation of the idea it sounds quite straight forward to me.

This approach would be closer to the native implementation and also fix the 
problem which would come up in around 28 years with the current
smudge implementation ;-)
 
--
Matthias

Back to the top