EMF Compare Performance [message #1764453] |
Tue, 30 May 2017 01:12  |
Eclipse User |
|
|
|
Hi all,
I have a problem with EMF Compare performance.
In my application I merge big models (files of about 3MB size). I have encountered poor performance during difference calculation. Using JProfiler, have I discovered that a lot of time is spent inside the getInverseReferences(...) method of ComparisonSpec class. Is there any way to speed up EMF Compare difference calculation?
Adam
|
|
|
|
|
|
|
|
|
Re: EMF Compare Performance [message #1768091 is a reply to message #1767745] |
Thu, 13 July 2017 10:11   |
Eclipse User |
|
|
|
Okay, this use case is really stretching towards the limits of EMF Compare...
- "flat" model - A huge number of elements in a single containment reference. This is a known factor of slowdowns for EMF Compare, once 1000-1500 elements are all contained in the same feature the LCS starts to perform poorly. Could be optimized with multi-threading but I've never come to implement it.
- no identifier - The proximity matcher is much slower than the identifier matcher, and this shows a lot here during the matching phase. There are ways this could be improved, but likely not for the "generic" use case. Would need customizations for your particular needs.
- comparing apples and oranges - This sample is trying to compare a model comprised of thousands of element... with an empty model. This means a huge number of differences. The more differences, the longer the comparison and merges. I'll assume your final use case will not be that extreme in its approach.
For the record, I've stopped trying to reproduce with this model after a little over 30 minutes. it was still in the "matching" phase, trying to match objects together. This is another thing that 'instinctively' feels wrong since matching should have been extremely fast (one of your models being empty, we should realize very fast that nothing else than that will match).
There are most likely ways to improve things for your use case, but I'll wager we'd need models representative of the actual use case you're trying to use EMF Compare in so that we can make the proper adjustments. In this particular case, it seems like changing the way the model iteration is made would already be a big improvement, as would be customizing the matching. I understand that you cannot share production models on a public channel, would you be interested in contacting Obeo to see if we can collaborate on these improvements?
Laurent Goubet
Obeo
|
|
|
Re: EMF Compare Performance [message #1768433 is a reply to message #1768091] |
Tue, 18 July 2017 12:03   |
Eclipse User |
|
|
|
Here is another cause for long-running comparisons:
package model
class Model {
contains Type[] types
contains Relationship[] relationships
// ...
}
abstract class Definition {
String name
contains Property[] properties
}
class Property {
String name
String value
}
class Type extends Definition { /* ... */ }
class Relationship extends Definition { /* ... */ }
Typically, our models contain 5K to 10K top-level objects each of which has about 5 properties -- which results in about 25K to 50K instances of Property. But we have also seen models that contain 125K Property instances and more. The problem now is
- We cannot take advantage of XMI ids. The models are mostly imported from external data and XMI ids of two different imports are totally unrelated.
- It is difficult to come up with ids for Property instances that are unique within the model [1].
- Without ids, the Property instances are compared with ProximityEObjectMatcher.
- ProximityEObjectMatcher first throws the properties of all definitions on a huge pile and then matches each property on this pile with all others, which is very expensive.
The attached Snippet generates similar models based on ECore (Definition → EClass, Property → EAnnotation), so you can test for yourself. Beware: comparing models with 100K or more annotations will take a very long time!
We solved this with a custom IEObjectMatcher that first matches all top-level objects (which are uniquely identified by type and name) and then uses a ProximityEObjectMatcher to match the contents of matching top-level objects [2]. This greatly reduces the scope for ProximityEObjectMatcher and the total running time from several minutes (or hours with huge models) to seconds (or a few minutes). As a side-effect, we no longer can detect moves of children (for example, a Property) from one top-level object to another, but that is fine with us [3].
[1] IdentifierEObjectMatcher is unnecessarily restricted to IDs of type string. Otherwise, you could use some kind of object that describes the path to a property without the need to serialize it to a string.
[2] Unfortunately, the IEObjectMatcher.createMatches(...) combines two separate steps: create the actual matches and restructure all matches in the comparison. So delegating a subset to another matcher always includes unnecessary costs. Separating these two steps would allow for a single restructuring after all matches have been found.
[3] If anything, instead of Property 'p' has been moved from Relationship 'xyz' to Type 'totally-unrelated-to-xyz' we prefer to report an addition and a deletion.
Andreas
|
|
|
Re: EMF Compare Performance [message #1768577 is a reply to message #1768433] |
Thu, 20 July 2017 04:08  |
Eclipse User |
|
|
|
Hi Andreas,
This is the kind of customizations that's almost necessary on a case-by-case basis. Matching without identifiers (through the ProximityEObjectMatcher) will be much slower, so you have to try and determine "shortcuts" that work for your use cases. In this particular example, I agree that having delete/add on properties is better than trying to match them throughout the whole model.
1) I agree that it would be nice to have "Object" instead of "String" identifiers, since we're calling "equals" on it anyway. This is a change that would imply an API break though, so cannot be done as long as we don't switch to a 4.0 for EMF Compare. There are other aspects on which we'd like to break things (and remove deprecated methods/classes), so this is something I'd like to track down for when we do change. Could you raise a bug for this?
2) When I tested earlier with Vrinda A. models, I felt like the ProximityEObjectMatcher was doing too much work by itself (matchAheadOfTime), but it is true that this method (createMatches) is combining two responsibilities that should be separated. However, this is not planned for now.
3) As I mentionned earlier, I think this is the best for your use case: go for the shortcuts that work for you. I see EMF Compare as a generic comparison engine for models, but it is also a framework from which to build your own comparison engines.
Laurent Goubet
Obeo
|
|
|
Powered by
FUDForum. Page generated in 0.04683 seconds