[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] LockFile.waitForStatChange()

On Wed, May 23, 2018 at 4:17 AM Nasser Grainawi <nasser@xxxxxxxxxxxxxx> wrote:
Hey JGit friends,

I've found myself puzzled by this bit of code and I'm hoping someone can enlighten me. This method just shows up in LockFile when Shawn created RefDirectory in 01b5392cdbc12ce2e21fd1d1afbd61fdf97e1c38 and has no further explanation to it. I can't figure out why we're comparing the timestamps of two different files (for example, packed-refs and packed-refs.lock) and ever expecting them to have been equal. Does someone understand that?

this is about finite resolution of file lastModified timestamps.

Git compares lastModified timestamps it obtains from lstat in order to speedup comparison of file contents.
E.g. during git status git needs to determine which files changed compared to the git index. The naive
implementation would compute the hash of the file content and compare that with the hash cached in the
index for the same path. This is slow.

Hence git first calls lstat (which is cheap) and compares lastModified timestamp from lstat against the cached timestamp
in the git index. If lastModified timestamp of the file is larger than the one cached in the git index we know
the file changed since the index was written. Due to finite filesystem timestamp resolution for all files which
have a lastModified timestamp in the filesystem matching the lastModified timestamp cached in the
git index we don't know if it was the git index file or the file at hand which was modified last.Â
On many *nix filesystems resolution is 1 second which increases probability to hit this so called racy-git problem [1].
For these files we need to re-compute the content hash to disambiguate between the two cases which needs time
and increases IO traffic.

If both files (in this example a file in the working tree and the git index file) are written in the same transaction
the likelihood of identical filesystem timestamp is high since resolution of persisted timestamps is coarse-grained
and the two files are written to disk almost at the same time. Hence it may pay off to call LockFile.waitForStatChange()
to ensure that the lastModified timestamp of the two files is different by at least one unit of the filesystem timestamp
resolution.

I guess RefDirectory.commitPackedRefs() is calling LockFile.waitForStatChange() to trade waiting once when writing
the lock file against having to reread content of the locked file later to disambiguate identical lastModified timestamps.
Though I am not sure for which scenarios this was introduced.Â

[1]Âhttps://github.com/git/git/blob/master/Documentation/technical/racy-git.txt
Â
-Matthias

For some context, I'm looking at improving/creating a FileSnapshot-like class that can have policy for not trusting timestamps due to NFS attribute caching. FileSnapshot is used in a lot of places and likely most of them will have issues when NFS caches the lastModified time (as discussed on an earlier thread). Fixing packed-refs is a top priority because we've actually been bitten by it recently in production. Loose objects and .exists() (which is also cached) will probably be next.

Thanks!
Nasser

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
jgit-dev mailing list
jgit-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jgit-dev