Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [stellation-res] UNITTESTS DROP: support for nested directories

On Sunday 04 August 2002 03:33 pm, Florin Iucha wrote:
> On Sat, Aug 03, 2002 at 07:04:35PM -0400, Mark C. Chu-Carroll wrote:
> > > CPU SAMPLES BEGIN (total = 13266) Sat Aug  3 16:58:01 2002
> > > rank   self  accum   count trace method
> > >    1 73.96% 73.96%    9812   233 java.lang.UNIXProcessReaper.run
> > >    2 25.31% 99.27%    3357   156 java.net.SocketInputStream.socketRead0
> > >
> > > With IBM's J2RE 1.3.1 IBM build cxia32131-20020410 JDK, the stats are:
> > >
> > > CPU SAMPLES BEGIN (total = 7206) Sat Aug  3 17:08:01 2002
> > > rank   self  accum   count trace method
> > >    1 53.26% 53.26%    3838   226 java.lang.UNIXProcessReaper.run
> > >    2 45.52% 98.78%    3280   148 java.net.SocketInputStream.socketRead0
> > >
> > > This is pretty much in line with what one sees when using top: during a
> > > unittest run, the CPU is used in bursts, and there are some inactivity
> > > periods.
> >
> > Remember that this is a client/server system. Watching my system on a
> > monitor, what I see matches the communication performance pretty
> > precisely.
> >
> > For example, when I do a checkin, I see a burst of CPU activity (the
> > local workspace code being scanned for changes), followed by a
> > lull in CPU activity but a major burst of network traffic (changes going
> > onto the wire), followed by a lull in both CPU and network (changes being
> > processed by the server), followed by a blip on the network and a burst
> > on the local CPU (updating the workspace metadata).
>
> There is another aspect of this profile info, besides the call of native
> binaries: all those tests were done with both client and server on the
> same computer. If the network protocol is so chatty that spends so much
> time over the local TCP connection (not loopback) then we will be in big
> trouble when using WAN connections.

Actually, this is something that's been very well tested, and I'm
confident that it's not going to be a serious problem.

The underlying communication protocol used by Stellation dates
back to a previous IBM project, called Manitoba. Manitoba 
was designed to allow client/server IDEs to operate over low-bandwidth,
high latency network connections, like WANs.

The protocol isn't chatty. It's actually very low-chat, and tends to
generally push things into small numbers of large messages,
rather than large numbers of small messages. 

On our local lan at IBM, which is an 100 base T ethernet, we
see close to no measurable performance difference between
a repository run locally, and a repository elsewhere on the network.

There's a couple of things that could be improved, of course,
First and foremost, Java IO sucks rocks. The less layers
of Java IO you use, the better the system will perform. Currently,
we're not careful about that - to write the socket, we use a
PrintWriter wrapping a OutputStreamWriter wrapping the socket
itself.  There's similar layering on the input side. All of that
buffering could be done away with, and we could work with
the socket streams directly. That's a change that's in the pipeline,
but that no one has had time to do. That'll make a *huge* difference
in our IO performance.

Second, we don't do near as much deltification as we should. 
Right now, checkin sends all compounds (directories), and the
full text of any artifact that changed. Checkout always sends
everything. (That doesn't sound too bad, until you realize that
in terms of wire protocol, merge checks out two versions of
the project, and then uses those to compute the merge on the
client.) With a moderate amount of effort, checkouts could be
modified to send differences relative to a particular version
that exists on the client.  With a significant amount of effort,
all checkins and checkouts could be deltas rather than full-body
modified artifacts. (I suspect that it's not worth going the full
delta route; sending modified artifacts in compressed form goes
a long way towards reducing the message size; for most uses,
I doubt that the communication time saved would be dramatically
larger than the amount of time needed to compute artifact values
by delta application.)

	-Mark

-- 
Mark Craig Chu-Carroll,  IBM T.J. Watson Research Center  
*** The Stellation project: Advanced SCM for Collaboration
***		http://www.eclipse.org/stellation
*** Work Email: mcc@xxxxxxxxxxxxxx  ------- Personal Email: markcc@xxxxxxxxxxx




Back to the top