Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: Project Model Improvements Re: [cdt-dev] CDT Summit Report

No the real problem is that we're storing way to much data. There is a lot of duplication that's there now. And I mean a lot. We should only be storing data that the user has changed from the defaults. And we need to look at how the scanner discovery information is stored which also polutes this file.
 
The problem isn't XML. There are efficient ways of loading XML, i.e. SAX. We also need to make sure the data structures we create from it are efficient.

Doug.
 

From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Mike Kucera
Sent: Wednesday, October 01, 2008 10:44 AM
To: CDT General developers list.
Subject: Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report

I agree that a proper relational database is a better solution for storing and retrieving data than an XML file. However there may be a problem with going this route. Users should be able to check their entire project into version control, including the .project and .cproject files. (There are problems with sharing .cproject, but ideally it should work.) Now, if we use a relational db instead, I'd assume the db would be stored in some kind of binary file, and that may be a problem for some version control systems.

So in the end we may be stuck using a text based format. I don't think XML is inherently evil, its just the way we are handing the XML that is clearly flawed. I'm not very experienced in this area but there may be better approaches to processing XML, like xpath queries or something. Another solution might be to write a SAX parser that directly builds the project description AST, without loading the entire DOM into memory.

There are ways of dealing with the problem of a crash during serialization. Whenever a change needs to be saved we could first rename the .cproject file, then write out a new version of the .cproject file, then delete the old one after. That way if power is cut during serialization the old file is still there.

Mike Kucera
Software Developer
IBM Eclipse CDT Team
mkucera@xxxxxxxxxx

Inactive hide details for "James Blackburn" ---10/01/2008 07:00:11 AM---Hi Doug,"James Blackburn" ---10/01/2008 07:00:11 AM---Hi Doug,


From:

"James Blackburn" <jamesblackburn@xxxxxxxxx>

To:

"CDT General developers list." <cdt-dev@xxxxxxxxxxx>

Date:

10/01/2008 07:00 AM

Subject:

Project Model Improvements Re: [cdt-dev] CDT Summit Report




Hi Doug,

> Not yet. We didn't get very much time on build as we did on the other
> topics. I think the summary of it is to redo the Project model to simplify
> it and to break the dependency on managed build that was introduced with it
> in 4.0.

Is there any indication on what's planned, who might be doing this
work, and in what time frame?

Time's come full circle again for me and I'm once again focusing on my
users with larger projects.  The problem with the current
implementation is twofold:
1) For an XML file >3M CDT exceeds a 512M HEAP and performance is
really bad (BUG238421)
2) The current XML model is not threadsafe (get the indexer going,
open settings page and hit apply rapidly... BUG239627).

I think this all boils down to choice of XML as the data structure.
It's verbose, inherently not threadsafe and, for all its verbosity,
it's still not human readable. The tree duplication is expensive in
terms of time and memory, and the end result is that changes made to
the project description from different threads can easily be lost
(BUG248962).
And all this before we consider what happens if a powercut or crash
happens during serialization.

It's my feeling that for something as important as the project model
we actually need a data store with ACID properties with reasonable
performance -- a very lightweight db would seem to be an ideal
solution.

I've got some time in my schedule to start work on this, but am keen
not to tread on anyone else's toes if they're intending on working on
the project model.  My first aim would be to port the existing project
model to use sqllite as its db backend, and any changes to the actual
structure of the model could be made in parallel or after.

If this all sounds like a really bad idea then someone please say!
Otherwise it's the only solution I can see that would ensure we have a
scalable project description with ACID properties.

Cheers,

James
_______________________________________________
cdt-dev mailing list
cdt-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/cdt-dev



Back to the top