Eclipse Community Forums: Compare » EMF Compare Performance

Help

Home

Home » Modeling » Compare » EMF Compare Performance

Show: Today's Messages :: Show Polls :: Message Navigator

EMF Compare Performance [message #1764453]

Tue, 30 May 2017 05:12

Adam Bialas

Messages: 14
Registered: July 2016

Junior Member

Hi all,

I have a problem with EMF Compare performance.
In my application I merge big models (files of about 3MB size). I have encountered poor performance during difference calculation. Using JProfiler, have I discovered that a lot of time is spent inside the getInverseReferences(...) method of ComparisonSpec class. Is there any way to speed up EMF Compare difference calculation?

Adam

Report message to a moderator

Re: EMF Compare Performance [message #1764577 is a reply to message #1764453]

Wed, 31 May 2017 12:02

Philip Langer

Messages: 99
Registered: March 2015
Location: Vienna, Austria

Member

Hi Adam,

Well it depends on many aspects. Are you using EGit as a back-end or plain compare-with each other? Which kinds of models do you use? Own Ecore or any existing? Do you have XMI:IDs in your models?

Note that we recently merged a couple of performance improvements. So it would be great if you could retry with the current master nightly build.

Thanks and best wishes,

Philip

--
Philip Langer

Get professional Eclipse developer support:
http://eclipsesource.com/en/services/developer-support/

--
Philip Langer

Get professional Eclipse developer support:
http://eclipsesource.com/en/services/developer-support/

Report message to a moderator

Re: EMF Compare Performance [message #1765279 is a reply to message #1764577]

Thu, 08 June 2017 05:58

Adam Bialas

Messages: 14
Registered: July 2016

Junior Member

Hi Philip,

1) I am not using EGit, I compare models inside my code
2) I use my own ecore
3) I use XMI:IDs in my model.

Unfortunately, I cannot test it now with the newest build but as soon as it will be possible I will write results.

Adam

Report message to a moderator

Re: EMF Compare Performance [message #1765336 is a reply to message #1765279]

Thu, 08 June 2017 14:09

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Hi Adam,

Could you tell us:
1 - what are "poor performance" in your case? You seem to be comparing models programmatically from inside your application. 3MB models is not really what we'd consider "big" and we strive to test the (and fix issues with) performance of the comparison pretty often for each release.
2 - how much time exactly is spent in that getInverseReferences method in your case?
3 - what do your models (and comparison) look like? A lot of elements with very few differences? A few elements with a lot of differences?

Could you provide us with sample models with which to reproduce the issue in order to hopefully fix the bottleneck you're exeriencing?

Regards,

Laurent Goubet
Obeo

Report message to a moderator

Re: EMF Compare Performance [message #1767589 is a reply to message #1765336]

Fri, 07 July 2017 09:50

Vrinda A

Messages: 8
Registered: November 2012

Junior Member

Hello Laurent,

I am facing a similar performance issue with EMF Compare at ComparisonSpec.getInverseReferences(Match) when I programmatically merge two EMF models (files are of sizes 93MB and 54MB). The merge takes about 7-8 hours!
These model objects have a lot of containment and references.
When I do 'flight recording' of 1 min duration using the JMC profiler, at several intervals, it points to the API ComparisonSpec.getInverseReferences(Match). I've attached the screenshot of JMC report for your reference.

Attachment: EMFCompare.png
(Size: 153.30KB, Downloaded 252 times)

Report message to a moderator

Re: EMF Compare Performance [message #1767715 is a reply to message #1767589]

Mon, 10 July 2017 08:58

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Hi,

The same problem as with Adam will happen: without models with which we can reproduce this issue, we cannot make meaningful improvements. I can look at the implementation and say some things look inefficient, but without a model to reproduce the actual issue with, I can't pinpoint what needs to be enhanced, or what is called way too many times. I have never observed this with the models we've been testing all this time, so it seems like something makes the issue much more important, problem is to find what that something is.

Laurent Goubet
Obeo

Report message to a moderator

Re: EMF Compare Performance [message #1767745 is a reply to message #1767715]

Mon, 10 July 2017 14:42

Vrinda A

Messages: 8
Registered: November 2012

Junior Member

Hi Laurent,

I am using AMALTHEA model from app4mc eclipse project.

I have tried to create an example by creating 1000 instances of each element of the example file AMALTHEA_Democar.amxmi from the example project of http://git.eclipse.org/c/?q=app4mc

I tried to programmatically merge the 'Big.amxmi' into 'Empty.amxmi' (both attached). Every object in this model has 'name' attribute and so I use a custom IdentifierEObjectMatcher which creates IDs based on unique names.

Before reaching the RequirementEngine (where my current issue is), the LCS computation in the DiffEngine also takes a lot of time.

I understand that the RequirementEngine does not cache the element references while going through the dependencies of the entire model. Could this be the reason for the huge time taken up by inverse reference finder?

If you want to take a look at the AMALTHEA model's ecore file, please refer http://git.eclipse.org/c/app4mc/org.eclipse.app4mc.git/tree/plugins/org.eclipse.app4mc.amalthea.model/model?h=releases/0.8.0

Regards,
Vrinda.

Attachment: BigFile.zip
(Size: 641.71KB, Downloaded 195 times)

Report message to a moderator

Re: EMF Compare Performance [message #1768091 is a reply to message #1767745]

Thu, 13 July 2017 14:11

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Okay, this use case is really stretching towards the limits of EMF Compare...

"flat" model - A huge number of elements in a single containment reference. This is a known factor of slowdowns for EMF Compare, once 1000-1500 elements are all contained in the same feature the LCS starts to perform poorly. Could be optimized with multi-threading but I've never come to implement it.
no identifier - The proximity matcher is much slower than the identifier matcher, and this shows a lot here during the matching phase. There are ways this could be improved, but likely not for the "generic" use case. Would need customizations for your particular needs.
comparing apples and oranges - This sample is trying to compare a model comprised of thousands of element... with an empty model. This means a huge number of differences. The more differences, the longer the comparison and merges. I'll assume your final use case will not be that extreme in its approach.

For the record, I've stopped trying to reproduce with this model after a little over 30 minutes. it was still in the "matching" phase, trying to match objects together. This is another thing that 'instinctively' feels wrong since matching should have been extremely fast (one of your models being empty, we should realize very fast that nothing else than that will match).

There are most likely ways to improve things for your use case, but I'll wager we'd need models representative of the actual use case you're trying to use EMF Compare in so that we can make the proper adjustments. In this particular case, it seems like changing the way the model iteration is made would already be a big improvement, as would be customizing the matching. I understand that you cannot share production models on a public channel, would you be interested in contacting Obeo to see if we can collaborate on these improvements?

Laurent Goubet
Obeo

Report message to a moderator

Re: EMF Compare Performance [message #1768433 is a reply to message #1768091]

Tue, 18 July 2017 16:03

Andreas Mayer

Messages: 11
Registered: April 2014

Junior Member

Here is another cause for long-running comparisons:

package model

class Model {
    contains Type[] types
    contains Relationship[] relationships
    // ...    
}

abstract class Definition {
    String name
    contains Property[] properties
}

class Property {
    String name
    String value
}

class Type extends Definition { /* ... */ }
class Relationship extends Definition { /* ... */ }

Typically, our models contain 5K to 10K top-level objects each of which has about 5 properties -- which results in about 25K to 50K instances of Property. But we have also seen models that contain 125K Property instances and more. The problem now is

We cannot take advantage of XMI ids. The models are mostly imported from external data and XMI ids of two different imports are totally unrelated.
It is difficult to come up with ids for Property instances that are unique within the model [1].
Without ids, the Property instances are compared with ProximityEObjectMatcher.
ProximityEObjectMatcher first throws the properties of all definitions on a huge pile and then matches each property on this pile with all others, which is very expensive.

The attached Snippet generates similar models based on ECore (Definition → EClass, Property → EAnnotation), so you can test for yourself. Beware: comparing models with 100K or more annotations will take a very long time!

We solved this with a custom IEObjectMatcher that first matches all top-level objects (which are uniquely identified by type and name) and then uses a ProximityEObjectMatcher to match the contents of matching top-level objects [2]. This greatly reduces the scope for ProximityEObjectMatcher and the total running time from several minutes (or hours with huge models) to seconds (or a few minutes). As a side-effect, we no longer can detect moves of children (for example, a Property) from one top-level object to another, but that is fine with us [3].

[1] IdentifierEObjectMatcher is unnecessarily restricted to IDs of type string. Otherwise, you could use some kind of object that describes the path to a property without the need to serialize it to a string.

[2] Unfortunately, the IEObjectMatcher.createMatches(...) combines two separate steps: create the actual matches and restructure all matches in the comparison. So delegating a subset to another matcher always includes unnecessary costs. Separating these two steps would allow for a single restructuring after all matches have been found.

[3] If anything, instead of Property 'p' has been moved from Relationship 'xyz' to Type 'totally-unrelated-to-xyz' we prefer to report an addition and a deletion.

Andreas

Attachment: GenerateExamples.java
(Size: 4.64KB, Downloaded 210 times)

Report message to a moderator

Re: EMF Compare Performance [message #1768577 is a reply to message #1768433]

Thu, 20 July 2017 08:08

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Hi Andreas,

This is the kind of customizations that's almost necessary on a case-by-case basis. Matching without identifiers (through the ProximityEObjectMatcher) will be much slower, so you have to try and determine "shortcuts" that work for your use cases. In this particular example, I agree that having delete/add on properties is better than trying to match them throughout the whole model.

1) I agree that it would be nice to have "Object" instead of "String" identifiers, since we're calling "equals" on it anyway. This is a change that would imply an API break though, so cannot be done as long as we don't switch to a 4.0 for EMF Compare. There are other aspects on which we'd like to break things (and remove deprecated methods/classes), so this is something I'd like to track down for when we do change. Could you raise a bug for this?

2) When I tested earlier with Vrinda A. models, I felt like the ProximityEObjectMatcher was doing too much work by itself (matchAheadOfTime), but it is true that this method (createMatches) is combining two responsibilities that should be separated. However, this is not planned for now.

3) As I mentionned earlier, I think this is the best for your use case: go for the shortcuts that work for you. I see EMF Compare as a generic comparison engine for models, but it is also a framework from which to build your own comparison engines.

Laurent Goubet
Obeo

Report message to a moderator

Previous Topic:	EMF Compare merge conflicts problem
Next Topic:	code snippet for loading comparing models from GIT

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Sep 24 22:46:19 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter