|
|
|
Re: EMF Compare Performance [message #1765336 is a reply to message #1765279] |
Thu, 08 June 2017 14:09 |
|
Hi Adam,
Could you tell us:
1 - what are "poor performance" in your case? You seem to be comparing models programmatically from inside your application. 3MB models is not really what we'd consider "big" and we strive to test the (and fix issues with) performance of the comparison pretty often for each release.
2 - how much time exactly is spent in that getInverseReferences method in your case?
3 - what do your models (and comparison) look like? A lot of elements with very few differences? A few elements with a lot of differences?
Could you provide us with sample models with which to reproduce the issue in order to hopefully fix the bottleneck you're exeriencing?
Regards,
Laurent Goubet
Obeo
|
|
|
|
Re: EMF Compare Performance [message #1767715 is a reply to message #1767589] |
Mon, 10 July 2017 08:58 |
|
Hi,
The same problem as with Adam will happen: without models with which we can reproduce this issue, we cannot make meaningful improvements. I can look at the implementation and say some things look inefficient, but without a model to reproduce the actual issue with, I can't pinpoint what needs to be enhanced, or what is called way too many times. I have never observed this with the models we've been testing all this time, so it seems like something makes the issue much more important, problem is to find what that something is.
Laurent Goubet
Obeo
|
|
|
|
Re: EMF Compare Performance [message #1768091 is a reply to message #1767745] |
Thu, 13 July 2017 14:11 |
|
Okay, this use case is really stretching towards the limits of EMF Compare...
- "flat" model - A huge number of elements in a single containment reference. This is a known factor of slowdowns for EMF Compare, once 1000-1500 elements are all contained in the same feature the LCS starts to perform poorly. Could be optimized with multi-threading but I've never come to implement it.
- no identifier - The proximity matcher is much slower than the identifier matcher, and this shows a lot here during the matching phase. There are ways this could be improved, but likely not for the "generic" use case. Would need customizations for your particular needs.
- comparing apples and oranges - This sample is trying to compare a model comprised of thousands of element... with an empty model. This means a huge number of differences. The more differences, the longer the comparison and merges. I'll assume your final use case will not be that extreme in its approach.
For the record, I've stopped trying to reproduce with this model after a little over 30 minutes. it was still in the "matching" phase, trying to match objects together. This is another thing that 'instinctively' feels wrong since matching should have been extremely fast (one of your models being empty, we should realize very fast that nothing else than that will match).
There are most likely ways to improve things for your use case, but I'll wager we'd need models representative of the actual use case you're trying to use EMF Compare in so that we can make the proper adjustments. In this particular case, it seems like changing the way the model iteration is made would already be a big improvement, as would be customizing the matching. I understand that you cannot share production models on a public channel, would you be interested in contacting Obeo to see if we can collaborate on these improvements?
Laurent Goubet
Obeo
|
|
|
Re: EMF Compare Performance [message #1768433 is a reply to message #1768091] |
Tue, 18 July 2017 16:03 |
Andreas Mayer Messages: 11 Registered: April 2014 |
Junior Member |
|
|
Here is another cause for long-running comparisons:
package model
class Model {
contains Type[] types
contains Relationship[] relationships
// ...
}
abstract class Definition {
String name
contains Property[] properties
}
class Property {
String name
String value
}
class Type extends Definition { /* ... */ }
class Relationship extends Definition { /* ... */ }
Typically, our models contain 5K to 10K top-level objects each of which has about 5 properties -- which results in about 25K to 50K instances of Property. But we have also seen models that contain 125K Property instances and more. The problem now is
- We cannot take advantage of XMI ids. The models are mostly imported from external data and XMI ids of two different imports are totally unrelated.
- It is difficult to come up with ids for Property instances that are unique within the model [1].
- Without ids, the Property instances are compared with ProximityEObjectMatcher.
- ProximityEObjectMatcher first throws the properties of all definitions on a huge pile and then matches each property on this pile with all others, which is very expensive.
The attached Snippet generates similar models based on ECore (Definition → EClass, Property → EAnnotation), so you can test for yourself. Beware: comparing models with 100K or more annotations will take a very long time!
We solved this with a custom IEObjectMatcher that first matches all top-level objects (which are uniquely identified by type and name) and then uses a ProximityEObjectMatcher to match the contents of matching top-level objects [2]. This greatly reduces the scope for ProximityEObjectMatcher and the total running time from several minutes (or hours with huge models) to seconds (or a few minutes). As a side-effect, we no longer can detect moves of children (for example, a Property) from one top-level object to another, but that is fine with us [3].
[1] IdentifierEObjectMatcher is unnecessarily restricted to IDs of type string. Otherwise, you could use some kind of object that describes the path to a property without the need to serialize it to a string.
[2] Unfortunately, the IEObjectMatcher.createMatches(...) combines two separate steps: create the actual matches and restructure all matches in the comparison. So delegating a subset to another matcher always includes unnecessary costs. Separating these two steps would allow for a single restructuring after all matches have been found.
[3] If anything, instead of Property 'p' has been moved from Relationship 'xyz' to Type 'totally-unrelated-to-xyz' we prefer to report an addition and a deletion.
Andreas
|
|
|
Re: EMF Compare Performance [message #1768577 is a reply to message #1768433] |
Thu, 20 July 2017 08:08 |
|
Hi Andreas,
This is the kind of customizations that's almost necessary on a case-by-case basis. Matching without identifiers (through the ProximityEObjectMatcher) will be much slower, so you have to try and determine "shortcuts" that work for your use cases. In this particular example, I agree that having delete/add on properties is better than trying to match them throughout the whole model.
1) I agree that it would be nice to have "Object" instead of "String" identifiers, since we're calling "equals" on it anyway. This is a change that would imply an API break though, so cannot be done as long as we don't switch to a 4.0 for EMF Compare. There are other aspects on which we'd like to break things (and remove deprecated methods/classes), so this is something I'd like to track down for when we do change. Could you raise a bug for this?
2) When I tested earlier with Vrinda A. models, I felt like the ProximityEObjectMatcher was doing too much work by itself (matchAheadOfTime), but it is true that this method (createMatches) is combining two responsibilities that should be separated. However, this is not planned for now.
3) As I mentionned earlier, I think this is the best for your use case: go for the shortcuts that work for you. I see EMF Compare as a generic comparison engine for models, but it is also a framework from which to build your own comparison engines.
Laurent Goubet
Obeo
|
|
|
Powered by
FUDForum. Page generated in 0.04047 seconds