|Re: [jgit-dev] isOutdated() and lastModified()|
On Sun, Dec 12, 2010 at 3:17 AM, Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> wrote: > 12 dec 2010 kl. 04:50 skrev Shawn Pearce: >> >> I wonder if we can't use DirCache to help us with the config and >> packed-refs files, and their "racy git" update problems. If we create >> $GIT_DIR/config.index as a standard DirCache > > We cache a timestamp internally. We could add one to notice the last > time we checked, or use the current one (last modified) differently, That's not sufficient. The DirCache would work because we are writing back to the filesystem when we read it, that write back causes the filesystem to mark its current clock on a file we know about. Later when we check config's timestamp we know its racily clean if the config file timestamp matches the timestamp of the DirCache file we use to track it. If their timestamps are the same, we know we read within a filesystem clock unit and cannot be certain other updates didn't occur during that same window. So we read again. Caching the system clock (from System.currentTimeMilis) when we read the file last won't help us, because currentTimeMillis typically has more precision than File.lastModified returns on the same platform. I think on every target platform currentTimeMillis has a precision of at least 10-20 miliseconds, while we are talking about File.lastModified potentially being in the 2 second range. > Command line JGit will need to > read the refs anyway. But I think you are right that we should modify this. I > did not read Dmitry's message close enough. For the refs, JGit always has to read the refs the first time it sees them. But within an embedded application like Gerrit Code Review, EGit, or Smart Git its not uncommon for JGit to be able to use its in-memory cache of the refs on disk... but we do run the risk of missing an update to a ref if multiple updates are made within the file system timestamp granularity. >> Another idea is to use what you suggested before somewhere else (I'm >> sorry I can't recall where this is)... if the file modification time >> is less than 3 seconds different from the last time we read the file, >> consider it outdated and read the file again anyway. > > That was a different story. The idea was to avoid a stat call to check the > length of a file against the cached information in the index, but assuming > that a if the length changed, then the timestamp would too. The three seconds > come from the file system with the lowest timestamp resolution of two seconds. > For refs we don't check the length at all, so there is no extra stat to avoid. But it still seems like it could apply here. If the ref (or config or packed-refs, etc.) file has a timestamp that is newer than 3 seconds ago, we should always read its contents. That way on FAT with its 2 second granularity we are certain that we always have the current value of the file. On Linux/Mac OS X where the stat implementation is busted and isn't returning tv_mtimensec as part of the lastModified() result we are still covered and can detect updates within the past 1 second. It also has the advantage of avoiding writing to the repository during a read. Not every reader has write access to a repository its trying to read, so we would need to have a fallback approach in case we couldn't update the DirCache if we used the DirCache approach above. Actually, I think this is similar to what we do with the $GIT_DIR/objects/pack directory. We try to avoid doing a readdir() on it (because that's expensive), but we use some sort of timestamp fudge on the directory's modification time. I should look at that code again and see if its doing what I described above, because it sounds like it should work. -- Shawn.
Back to the top