RE: Project Model Improvements Re: [cdt-dev] CDT Summit Report

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

RE: Project Model Improvements Re: [cdt-dev] CDT Summit Report

From: "Schaefer, Doug" <Doug.Schaefer@xxxxxxxxxxxxx>
Date: Wed, 1 Oct 2008 07:50:13 -0700
Delivered-to: cdt-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/cdt-dev>
List-help: <mailto:cdt-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/cdt-dev>, <mailto:cdt-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: Ackj1EOEEycYL6znRRCLv1W6Bl+aswAACwmg
Thread-topic: Project Model Improvements Re: [cdt-dev] CDT Summit Report

No the real problem is that we're storing way to much data. There is a lot of duplication that's there now. And I mean a lot. We should only be storing data that the user has changed from the defaults. And we need to look at how the scanner discovery information is stored which also polutes this file.

The problem isn't XML. There are efficient ways of loading XML, i.e. SAX. We also need to make sure the data structures we create from it are efficient.

Doug.

From: cdt-dev-bounces@xxxxxxxxxxx [mailto:cdt-dev-bounces@xxxxxxxxxxx] On Behalf Of Mike Kucera
Sent: Wednesday, October 01, 2008 10:44 AM
To: CDT General developers list.
Subject: Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report

I agree that a proper relational database is a better solution for storing and retrieving data than an XML file. However there may be a problem with going this route. Users should be able to check their entire project into version control, including the .project and .cproject files. (There are problems with sharing .cproject, but ideally it should work.) Now, if we use a relational db instead, I'd assume the db would be stored in some kind of binary file, and that may be a problem for some version control systems.

So in the end we may be stuck using a text based format. I don't think XML is inherently evil, its just the way we are handing the XML that is clearly flawed. I'm not very experienced in this area but there may be better approaches to processing XML, like xpath queries or something. Another solution might be to write a SAX parser that directly builds the project description AST, without loading the entire DOM into memory.

There are ways of dealing with the problem of a crash during serialization. Whenever a change needs to be saved we could first rename the .cproject file, then write out a new version of the .cproject file, then delete the old one after. That way if power is cut during serialization the old file is still there.

Mike Kucera
Software Developer
IBM Eclipse CDT Team
mkucera@xxxxxxxxxx

"James Blackburn" ---10/01/2008 07:00:11 AM---Hi Doug,

From:
"James Blackburn" <jamesblackburn@xxxxxxxxx>

To:
"CDT General developers list." <cdt-dev@xxxxxxxxxxx>

Date:
10/01/2008 07:00 AM

Subject:
Project Model Improvements Re: [cdt-dev] CDT Summit Report

Hi Doug, > Not yet. We didn't get very much time on build as we did on the other > topics. I think the summary of it is to redo the Project model to simplify > it and to break the dependency on managed build that was introduced with it > in 4.0. Is there any indication on what's planned, who might be doing this work, and in what time frame? Time's come full circle again for me and I'm once again focusing on my users with larger projects. The problem with the current implementation is twofold: 1) For an XML file >3M CDT exceeds a 512M HEAP and performance is really bad (BUG238421) 2) The current XML model is not threadsafe (get the indexer going, open settings page and hit apply rapidly... BUG239627). I think this all boils down to choice of XML as the data structure. It's verbose, inherently not threadsafe and, for all its verbosity, it's still not human readable. The tree duplication is expensive in terms of time and memory, and the end result is that changes made to the project description from different threads can easily be lost (BUG248962). And all this before we consider what happens if a powercut or crash happens during serialization. It's my feeling that for something as important as the project model we actually need a data store with ACID properties with reasonable performance -- a very lightweight db would seem to be an ideal solution. I've got some time in my schedule to start work on this, but am keen not to tread on anyone else's toes if they're intending on working on the project model. My first aim would be to port the existing project model to use sqllite as its db backend, and any changes to the actual structure of the model could be made in parallel or after. If this all sounds like a really bad idea then someone please say! Otherwise it's the only solution I can see that would ensure we have a scalable project description with ACID properties. Cheers, James _______________________________________________ cdt-dev mailing list cdt-dev@xxxxxxxxxxxhttps://dev.eclipse.org/mailman/listinfo/cdt-dev

Follow-Ups:
- Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report
  - From: James Blackburn

References:
- Project Model Improvements Re: [cdt-dev] CDT Summit Report
  - From: James Blackburn
- Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report
  - From: Mike Kucera

Prev by Date: RE: [cdt-dev] How to differenciate a call to a function from areferencing its address?
Next by Date: RE: [cdt-dev] -var-create on typedef'ed array inside a structure fails
Previous by thread: Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report
Next by thread: Re: Project Model Improvements Re: [cdt-dev] CDT Summit Report
Index(es):
- Date
- Thread

From:	"James Blackburn" <jamesblackburn@xxxxxxxxx>
To:	"CDT General developers list." <cdt-dev@xxxxxxxxxxx>
Date:	10/01/2008 07:00 AM
Subject:	Project Model Improvements Re: [cdt-dev] CDT Summit Report

Breadcrumbs