Re: [p2-dev] query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [p2-dev] query performance

From: Thomas Hallgren <thomas@xxxxxxx>
Date: Tue, 09 Aug 2011 10:04:17 +0200
Delivered-to: p2-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/p2-dev>
List-help: <mailto:p2-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/p2-dev>, <mailto:p2-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/p2-dev>, <mailto:p2-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Lightning/1.0b2 Thunderbird/3.1.11

Hi Mengxin,

I took a look at your query. Here are some comments and hints that might help you speed things up:

1. You use the method toSet() on the result instead of toUnmodifiableSet(). This will yield an extra (and probablyunnecessary copy). Please use toUnmodifiableSet() where possible.

2. You create an array from the incoming collection before you pass it to the query. You can avoid this extra copy bypassing the Collection directly.


3. The statement:

"select(iu | $0.exists(iu2 | iu2.requirements.exists(r | iu ~= r )))"

suggests that you want to find all IU's that are required by some IU in the incoming collection. That's a one steptraversal. All those new IU's will introduce new requirements and in order to find them all the way the planner does,you must continue evaluating this query until no more units are found. A better way to resolve this is to use a traversequery:


"$0.traverse(parent | parent.requirements.collect(rc | select(iu | iu ~= rc)).flatten())"

If $0 is a large collection then it's likely that an initial 'unique' of all relevant requirements will improveperformance significantly:


"$0.traverse(set(), _, { cache, parent | parent.requirements.unique(cache).collect(rc |  select(iu | iu ~= rc)).flatten()})"

To really speed things up, you might also want to prune the unique list of requirements to only include those that havethe desired namespace:


select(rc | rc.namespace == 'org.eclipse.equinox.p2.iu').

"$0.traverse(set(), _, { cache, parent | parent.requirements.unique(cache).select(rc | rc.namespace =='org.eclipse.equinox.p2.iu').collect(rc | select(iu | iu ~= rc)).flatten()})"


If you try this out, please publish your results.

HTH,

Thomas Hallgren



On 2011-08-09 09:13, Mengxin Zhu wrote:

I find the performance of using query language has great downgrade if querying a repository with a great number ofIUs. I'm not sure whether it's a common case, at least it does in my case.
I already have a list of non-installed root and group IUs, I want to query the non-installed IUs from repository thatare required by those root and group IUs.
I compare the different three methods to query different size of IUs. They are using Provisioning planner to resolveand query the required IUs, query language and a way to use for loop.
I publish my methods as a document[1], and query benchmark as a spreadsheet[2].
Actually I prefer to use query language, the code looks like much cleaner. Does anybody know why query language isquite slow to handle with the great number of IUs, or how to tune my query expression?
[1] https://docs.google.com/document/d/1wfnr2d2TF4vIYDCMmWPuYd0kQA32WiWaXTiaCoJovho/edit
[2] https://spreadsheets.google.com/spreadsheet/ccc?key=0AmxBoq-n1R8KdEZ4czdpQk9lMEpvR3pUbzZaZzltTGc

Follow-Ups:
- Re: [p2-dev] query performance
  - From: Mengxin Zhu

References:
- [p2-dev] query performance
  - From: Mengxin Zhu

Prev by Date: [p2-dev] query performance
Next by Date: [p2-dev] AUTO: David Klein is out of the office (returning 08/15/2011)
Previous by thread: [p2-dev] query performance
Next by thread: Re: [p2-dev] query performance
Index(es):
- Date
- Thread

Breadcrumbs