[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [p2-dev] query performance

Hi Mengxin,

I took a look at your query. Here are some comments and hints that might help you speed things up:

1. You use the method toSet() on the result instead of toUnmodifiableSet(). This will yield an extra (and probably unnecessary copy). Please use toUnmodifiableSet() where possible.

2. You create an array from the incoming collection before you pass it to the query. You can avoid this extra copy by passing the Collection directly.

3. The statement:

"select(iu | $0.exists(iu2 | iu2.requirements.exists(r | iu ~= r )))"

suggests that you want to find all IU's that are required by some IU in the incoming collection. That's a one step traversal. All those new IU's will introduce new requirements and in order to find them all the way the planner does, you must continue evaluating this query until no more units are found. A better way to resolve this is to use a traverse query:

"$0.traverse(parent | parent.requirements.collect(rc | select(iu | iu ~= rc)).flatten())"

If $0 is a large collection then it's likely that an initial 'unique' of all relevant requirements will improve performance significantly:

"$0.traverse(set(), _, { cache, parent | parent.requirements.unique(cache).collect(rc |  select(iu | iu ~= rc)).flatten()})"

To really speed things up, you might also want to prune the unique list of requirements to only include those that have the desired namespace:

select(rc | rc.namespace == 'org.eclipse.equinox.p2.iu').

"$0.traverse(set(), _, { cache, parent | parent.requirements.unique(cache).select(rc | rc.namespace == 'org.eclipse.equinox.p2.iu').collect(rc | select(iu | iu ~= rc)).flatten()})"

If you try this out, please publish your results.


Thomas Hallgren

On 2011-08-09 09:13, Mengxin Zhu wrote:
I find the performance of using query language has great downgrade if querying a repository with a great number of IUs. I'm not sure whether it's a common case, at least it does in my case.

I already have a list of non-installed root and group IUs, I want to query the non-installed IUs from repository that are required by those root and group IUs.

I compare the different three methods to query different size of IUs. They are using Provisioning planner to resolve and query the required IUs, query language and a way to use for loop.

I publish my methods as a document[1], and query benchmark as a spreadsheet[2].

Actually I prefer to use query language, the code looks like much cleaner. Does anybody know why query language is quite slow to handle with the great number of IUs, or how to tune my query expression?

[1] https://docs.google.com/document/d/1wfnr2d2TF4vIYDCMmWPuYd0kQA32WiWaXTiaCoJovho/edit
[2] https://spreadsheets.google.com/spreadsheet/ccc?key=0AmxBoq-n1R8KdEZ4czdpQk9lMEpvR3pUbzZaZzltTGc