|I have done further investigation and I'm really worried about what we've created...|
Here is the experiment I have performed:
- load helios in its current format and force it to be rewritten using the new serialization format (I have changed the code to force the writing in the new format).
Here is a list of observations:
- The new format is significantly bigger. For example, 42M vs 36M in XML and 3.9M instead of 3.6M compressed.
- Massive increase in memory consumption:
- No sharing of the _expression_ object resulting from the parsing of the match expressions in each requirement.
- Unnecessary string pooling of the complete match _expression_ being loaded
- No pooling of the parameters (the parameters contain the id for IUs, packages, the versions, etc.).
- In short, it does not scale.
That said, for the Helios release, given that none of the IUs being generated by the publisher uses the new match expressions, and given that the metadata writer persists things in the 3.4 format as much as possible we will not be encountering these memory problems.
So my real question is: do we want to try to fix this new serialization format for 3.6.0, or do we want to go out in the field with this and define yet another format next year knowing that we will still have to support the 3.6 format.
On 2010-05-08, at 2:25 AM, Thomas Hallgren wrote:
I agree that the XML encoded string that represents an _expression_ is ugly.
On 05/07/2010 06:32 PM, Pascal Rapicault wrote:
While working on a solution to prevent RAP and the IDE to be installed together (306709), I met the serialized format of queries and I find that extremely unreadable (see example below). On top of that I'm also questioning the ability for this format to compress has good as before.
So my few questions are:
1) can we make this format more readable ?
We can write it out as a CDATA element, i.e.
<requirement min='0' max='0' greedy='false'>
<![CDATA[providedCapabilities.exists(x | x.name
== $0&& x.namespace == $1&& x.version>= $2&& x.version< $3)]]>
<![CDATA[['org.eclipse.rap.rwt', 'org.eclipse.equinox.p2.iu', version('1.0.0'), version('2.0.0')]]]>
What does the old parser to when it encounters elements that it doesn't recognize? I know that attributes are ignored. Does that also apply to elements?
2) does this compress as good as before?
I can't see why not. It's all keywords, operators, and well known entities.
3) is parsing as fast as before?
The QL parser is extremely fast so I don't think it's parsing will be measurable. The XML parser is exposed to an attribute with a lot of entities in it, but my guess is that it's very optimized to deal with that. The only way to find out is to write performance tests. An easy test would be to force the serializer to write everything in this format.
p2-dev mailing listp2-dev@xxxxxxxxxxx