Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] Recognizing changed files

On Thursday 22 August 2002 01:38 pm, Dave Shields wrote:
> On Thu, Aug 22, 2002 at 08:29:10AM +0000, Mark C. Chu-Carroll wrote:
> > On Thursday 22 August 2002 11:26 am, Jonathan Gossage wrote:
> > > This problem is at the root of all the queer behaviour I am currently
> > > observing while trying to test MySQL. A clean solution will take a bit
> > > of thought. One possible solution which would cover all scenarios on
> > > both Windows and UNIX would be to compute a hash for each file when it
> > > is checked out and store this hash in the project data. Then if the
> > > timestamp has changed in any way, compute the hash again on the same
> > > file and see if they are the same. If they are, the file can be treated
> > > as unchanged and the timestamp adjusted appropriately. The major
> > > problem with this approach is that it is somewhat resource intensive.
> >
> > In fact, that's *roughly* what we do.
> >
> > For a while, we were using hashcodes (we called them signatures)
> > exclusively for detecting what changed. The problem was, this gets
> > *incredibly* slow for detecting what changed in a a large workspace. (If
> > I recall correctly, in one test, it could take over an hour to scan over
> > the Linux kernel sources looking for changes.) So, as an optimization, we
> > started using timestamps as a method to determine whether or not we
> > needed to recompute the signature.
>
> As I recall, the problem with the kernel source was the long time to
> checkin new files, and we never explicitly found the cause, though our
> guess was PostgreSQL/JDBC slowness. I do recall that time to detect a
> change was not that large, and that's the part that involves signature
> computation.

So my memory is bad.  I thought there was some test several months ago that 
took an hour, and dropped to something like half after changing signature 
checking. I'm probably mixing up a couple of different examples.

I do recall that there was a noticable speedup when
we switched to the date-based mechanism: passing every byte of a project
through Java IO is inevitably time-consuming.

> > Here's my proposal for a fix: instead of checking if the timestamp is
> > *newer* than what's  recorded in the project, just check if it's
> > *different*. If it's different, then do the signature comparison. That
> > should cover this problem, right?
> >
> > 	-Mark
>
> We used to check for 'newer' but have been checking only for 'different'
> for many months. The current code makes the following assumptions:
>
> -- If you modify a file, the modification time (as returned by
> java.lang.io.File.lastModified()) will change.
>
> -- If the file modification time has changed since the most recent checkin
> or checkout, then the file *may* have been changed, and so the signature
> must be computed to determine if the file has actually been changed.
>
> The first assumption is really an assumption about the operating system.
> We are assuming sane behavior, as found in Unix, while Jonathan's note
> indicates Windows, as in its handling of case in file names, is insane, so
> we'll need to make appropriate sanity (insanity?) checks. The relevant code
> can be found in my latest workspace changes, in Project.getModified(...)
> and Project.update(...).


Let's not make any changes like this until we're sure of what's going on... I
don't want to introduce lots of bizzare sanity checks, or make changes that 
will create significant performance issues until we're absolutely sure that 
they're necessary.

	-Mark


-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center  
*** The Stellation project: Advanced SCM for Collaboration
***		http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx  ------- Personal Email: markcc@xxxxxxxxxxx




Back to the top