[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [stellation-res] Recognizing changed files
|
On Thu, Aug 22, 2002 at 08:29:10AM +0000, Mark C. Chu-Carroll wrote:
> On Thursday 22 August 2002 11:26 am, Jonathan Gossage wrote:
> > This problem is at the root of all the queer behaviour I am currently
> > observing while trying to test MySQL. A clean solution will take a bit of
> > thought. One possible solution which would cover all scenarios on both
> > Windows and UNIX would be to compute a hash for each file when it is
> > checked out and store this hash in the project data. Then if the timestamp
> > has changed in any way, compute the hash again on the same file and see if
> > they are the same. If they are, the file can be treated as unchanged and
> > the timestamp adjusted appropriately. The major problem with this approach
> > is that it is somewhat resource intensive.
>
> In fact, that's *roughly* what we do.
>
> For a while, we were using hashcodes (we called them signatures) exclusively
> for detecting what changed. The problem was, this gets *incredibly* slow for
> detecting what changed in a a large workspace. (If I recall correctly, in one
> test, it could take over an hour to scan over the Linux kernel sources
> looking for changes.) So, as an optimization, we started using timestamps as
> a method to determine whether or not we needed to recompute the signature.
As I recall, the problem with the kernel source was the long time to checkin
new files, and we never explicitly found the cause, though our guess was PostgreSQL/JDBC
slowness. I do recall that time to detect a change was not that large, and that's the
part that involves signature computation.
>
> Here's my proposal for a fix: instead of checking if the timestamp is *newer*
> than what's recorded in the project, just check if it's *different*. If it's
> different, then do the signature comparison. That should cover this problem,
> right?
>
> -Mark
>
We used to check for 'newer' but have been checking only for 'different' for many months.
The current code makes the following assumptions:
-- If you modify a file, the modification time (as returned by java.lang.io.File.lastModified())
will change.
-- If the file modification time has changed since the most recent checkin or checkout, then
the file *may* have been changed, and so the signature must be computed to determine if the
file has actually been changed.
The first assumption is really an assumption about the operating system.
We are assuming sane behavior, as found in Unix, while Jonathan's note indicates Windows,
as in its handling of case in file names, is insane, so we'll need to make appropriate
sanity (insanity?) checks. The relevant code can be found in my latest workspace changes,
in Project.getModified(...) and Project.update(...).
dave
soon-to-be-published
--
Dave Shields, IBM Research, shields@xxxxxxxxxxxxxx.