Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] isOutdated() and lastModified()

On Sun, Dec 12, 2010 at 3:17 AM, Robin Rosenberg
<robin.rosenberg@xxxxxxxxxx> wrote:
> 12 dec 2010 kl. 04:50 skrev Shawn Pearce:
>>
>> I wonder if we can't use DirCache to help us with the config and
>> packed-refs files, and their "racy git" update problems.  If we create
>> $GIT_DIR/config.index as a standard DirCache
>
> We cache a timestamp internally. We could add one to notice the last
> time we checked, or use the current one (last modified) differently,

That's not sufficient.  The DirCache would work because we are writing
back to the filesystem when we read it, that write back causes the
filesystem to mark its current clock on a file we know about.  Later
when we check config's timestamp we know its racily clean if the
config file timestamp matches the timestamp of the DirCache file we
use to track it.  If their timestamps are the same, we know we read
within a filesystem clock unit and cannot be certain other updates
didn't occur during that same window.  So we read again.

Caching the system clock (from System.currentTimeMilis) when we read
the file last won't help us, because currentTimeMillis typically has
more precision than File.lastModified returns on the same platform.  I
think on every target platform currentTimeMillis has a precision of at
least 10-20 miliseconds, while we are talking about File.lastModified
potentially being in the 2 second range.

> Command line JGit will need to
> read the refs anyway. But I think you are right that we should modify this. I
> did not read Dmitry's message close enough.

For the refs, JGit always has to read the refs the first time it sees
them.  But within an embedded application like Gerrit Code Review,
EGit, or Smart Git its not uncommon for JGit to be able to use its
in-memory cache of the refs on disk... but we do run the risk of
missing an update to a ref if multiple updates are made within the
file system timestamp granularity.

>> Another idea is to use what you suggested before somewhere else (I'm
>> sorry I can't recall where this is)... if the file modification time
>> is less than 3 seconds different from the last time we read the file,
>> consider it outdated and read the file again anyway.
>
> That was a different story. The idea was to avoid a stat call to check the
> length of a file against the cached information in the index, but assuming
> that a if the length changed, then the timestamp would too. The three seconds
> come from the file system with the lowest timestamp resolution of two seconds.
> For refs we don't check the length at all, so there is no extra stat to avoid.

But it still seems like it could apply here.  If the ref (or config or
packed-refs, etc.) file has a timestamp that is newer than 3 seconds
ago, we should always read its contents.  That way on FAT with its 2
second granularity we are certain that we always have the current
value of the file.  On Linux/Mac OS X where the stat implementation is
busted and isn't returning tv_mtimensec as part of the lastModified()
result we are still covered and can detect updates within the past 1
second.

It also has the advantage of avoiding writing to the repository during
a read.  Not every reader has write access to a repository its trying
to read, so we would need to have a fallback approach in case we
couldn't update the DirCache if we used the DirCache approach above.

Actually, I think this is similar to what we do with the
$GIT_DIR/objects/pack directory.  We try to avoid doing a readdir() on
it (because that's expensive), but we use some sort of timestamp fudge
on the directory's modification time.  I should look at that code
again and see if its doing what I described above, because it sounds
like it should work.

-- 
Shawn.


Back to the top