Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » Compare » On EMF Compare 2.X scalability(some thoughts and questions on 2.x scalability)
On EMF Compare 2.X scalability [message #1051942] Mon, 29 April 2013 08:04 Go to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Hi there,

aside from getting used to the new apis, I've faced new challenges on the 2.X stream.
My main concern is current matching algorithm's scalability. I got surprised to see how test-cases that used to take 15 seconds, now need 20 minutes to finalize.

I've debugged the code and compared it to 1.X, and found some differences that look to me as possible responsibles to this poor performance. Aparently, the scope of comparison has heavily increased on the new match engine. Inspecting ProximityIndex.findTheClosest(Comparison, EObject, Side, Side, boolean) shows how now the engine tries to match each element with every other possible element on the model with the same type (ByTypeIndex takes care of that classifying by type). You can check how the "storageToSearchFor" list contains all elements with the same type in the model as the object to match.

Oppossed to that, 1.X used content matching strategies to narrow the scope of the comparison. This means: we will only compare elements that are around the same point of the models structure, by doing a structural/content match. There was even a search window parameter to narrow/broad even more the match scope in case necessary. Although this could lead to some false positives, it was way more efficient.

For big models, the 2.x behaviour end up in almost 2 * n * n comparisons in the worst case, where n is the number of elements, and a potential 2 due to double check mechanism. Obiously, the new approach is safer in the sense that nothing remains out of the match calculations, but this appears not to scale on big models.

I got surprised to realise this, when scalability has always been a concern to the project.

So, the questions here are:
- Is it true what I've found or I'm missing something?
- Why is the model flattened and compared, and why was the content match of 1.x discarded?
- Any recommended way to avoid this exponential number of comparisons?
- Are there any possible optimizations to know?

Thanks in advance,
Víctor.
Re: On EMF Compare 2.X scalability [message #1052096 is a reply to message #1051942] Mon, 29 April 2013 12:08 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
Hi Victor,


"Is it true what I've found or I'm missing something?"

Yes it's pretty much true except the current implementation try to avoid "comparing too much" by looking for perfect identical objects with a cheap comparison before going further. As you said this approach allow us to have "better" results but can ends up being more "explosive" from a computational point of view. So far our efforts have been focused on the merge consistency and better scalability through scope reduction, content match got rewritten by reusing the Diff logic itself and assume most of the model elements are not changing.

Longer term, we intend to have a proper distance from this and then have the ability to sort the storageToSearchFor, that would lead to having the best of both worlds and quickly find the closest element.
This approach had not be implemented yet because of lack of time, but I would gladly review the work of anybody wanting to speed things up.

"Why is the model flattened and compared, and why was the content match of 1.x discarded?"

One of the key requirement of 2.x is the ability to compute differences operating on arbitrary scopes and not rely on the containment hiearchy. Furthermore we need to be sure to find the "best match" whenever thats possible.

- Any recommended way to avoid this exponential number of comparisons?

The figure you gives makes me think our "assumption" are not valid at all in your case and you ends up in the worst case..
Is your model "completely different" ? Are you using Featuremaps ?
ProximityIndex has a ProximityMatchStats object keeping track of the computation made during the process, what is it giving to you ?

- Are there any possible optimizations to know?

Many are coming to my mind, we could keep the objects in their insertion order and introduce a SEARCH_WINDOW again, reducing the "explosion", we can make sure the match process is not comparing references of the EObject avoiding the LCS computation during the match, re-implement a match engine mimicking the 1.x behavior... In any case it would be interesting for us to understand what is going on exactly with your models leading to this.
Re: On EMF Compare 2.X scalability [message #1052597 is a reply to message #1052096] Tue, 30 April 2013 05:30 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Cedric,

some coments below

Quote:
Yes it's pretty much true except the current implementation try to avoid "comparing too much" by looking for perfect identical objects with a cheap comparison before going further. As you said this approach allow us to have "better" results but can ends up being more "explosive" from a computational point of view. So far our efforts have been focused on the merge consistency and better scalability through scope reduction, content match got rewritten by reusing the Diff logic itself and assume most of the model elements are not changing.

Longer term, we intend to have a proper distance from this and then have the ability to sort the storageToSearchFor, that would lead to having the best of both worlds and quickly find the closest element.
This approach had not be implemented yet because of lack of time, but I would gladly review the work of anybody wanting to speed things up.


If I find something useful to share with you guys, I'll gladly do it. So far, the only thing I could come with is some sort of heuristic to reduce the size of storageToSearchFor by taking only elements with the same parent as the object subject to be matched. Not working perfectly anyway, but reduced the computation time.

Quote:

- Any recommended way to avoid this exponential number of comparisons?
The figure you gives makes me think our "assumption" are not valid at all in your case and you ends up in the worst case..
Is your model "completely different" ? Are you using Featuremaps ?
ProximityIndex has a ProximityMatchStats object keeping track of the computation made during the process, what is it giving to you ?


We are using the OMG's KDM metamodel. The test-case model I'm using is not that big (17mb XMI file), and there are not many differences. Neiher we are using FeatureMaps.
The objects are stored in a CDO Server instance before comparing. I used to have problems with CDO in the 1.X stream (regarding both scalability and reliability), but in 2.X both XMI and CDO take the same amount of time to compute the differences. However, I'm yet to determine whether the fact is stored in CDO may introduce false-positives (which happened quite a lot in the 1.X stream).

I took a look at the stats variable. As you know, its segmented by type (due to ByTypeIndex), but these are some of the most "conflictive" elements:

Addresses -> ProximityMatchStats [nbIndenticComparison=313935, nbMaxDistComparison=137085, nbnoMatch=104, nbSuccessIdenticComparison=3854, nbSuccessMaxComparison=32, nbBacktrack=240144, nbDoubleCheck=104, nbFailedDoubleCheck=72]

Reads -> ProximityMatchStats [nbIndenticComparison=642745, nbMaxDistComparison=474863, nbnoMatch=279, nbSuccessIdenticComparison=5366, nbSuccessMaxComparison=130, nbBacktrack=483938, nbDoubleCheck=238, nbFailedDoubleCheck=108]

Hope that helps, I'll keep trying to find whats going on.
Re: On EMF Compare 2.X scalability [message #1052621 is a reply to message #1052597] Tue, 30 April 2013 06:01 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
I've managed to cut down the comparison time by ignoring some references on the model. The thing is, I already had a custom FeatureFilter installed in my own DiffEngine,
but apparently didn't come into play because org.eclipse.emf.compare.match.eobject.EditionDistance uses EditionDistance.CountingDiffEngine instead of my extended DiffEngine.
It looks like EditionDistance (used for the identic match) is ignoring any customizations introduced at DiffEngine level. Is this expected?
Re: On EMF Compare 2.X scalability [message #1053694 is a reply to message #1052621] Tue, 07 May 2013 05:12 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
Using the same feature filter for both the Match and Diff would be a better default IMO as long as one can have provide different ones when needed.

Regarding scalability I'm currently experimenting in containing the computation by using a SEARCH_WINDOW just like EMF compare 1.x. By doing so I have quite a few changes in the way big models are matched, I'm not sure yet if these changes are "normal" or if they are problematic and I'm going through them right now. In the meantime, if you want to give it a try, here is the modified ProximityMatcher

http://pastebin.com/cPZPE1gN

edit : SEARCH_WINDOW = 1500 works well for me from a matching quality perspective.

[Updated on: Tue, 07 May 2013 05:50]

Report message to a moderator

Re: On EMF Compare 2.X scalability [message #1053807 is a reply to message #1053694] Tue, 07 May 2013 12:16 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
I continued to work on the "search window" capability (not ready for general consumption yet), in the meantime I also introduced a cache for the distance computation.

http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/commit/?id=6350fbb63dc60973510cd27068da73767b9ee014

It cuts my comparison times on decent to big models, could you have a try and tell me how it goes for you ?
Re: On EMF Compare 2.X scalability [message #1054852 is a reply to message #1053807] Wed, 08 May 2013 05:46 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Hi Cedric,

thanks for taking some time to look into this.
Quote:

I continued to work on the "search window" capability (not ready for general consumption yet), in the meantime I also introduced a cache for the distance computation.

http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/commit/?id=6350fbb63dc60973510cd27068da73767b9ee014

It cuts my comparison times on decent to big models, could you have a try and tell me how it goes for you ?


I just checked out git HEAD, and the new org.eclipse.emf.compare.match.eobject.internal.CachingDistance seems to be missing in the repository :S I made sure the commit in synchronized with is the same you shared (which happens to be HEAD right now). Maybe you forgot to commit it too?

Quote:

Using the same feature filter for both the Match and Diff would be a better default IMO as long as one can have provide different ones when needed.

Regarding scalability I'm currently experimenting in containing the computation by using a SEARCH_WINDOW just like EMF compare 1.x. By doing so I have quite a few changes in the way big models are matched, I'm not sure yet if these changes are "normal" or if they are problematic and I'm going through them right now. In the meantime, if you want to give it a try, here is the modified ProximityMatcher

http://pastebin.com/cPZPE1gN

edit : SEARCH_WINDOW = 1500 works well for me from a matching quality perspective.


Ill check it and let you know, first I have to get rid of compilation errors Razz


Another thing, not really related, but may help you guys.
There seems to exist a JDK 1.6 usage in org.eclipse.emf.compare.ide.ui:

/org.eclipse.emf.compare.ide.ui/src/org/eclipse/emf/compare/ide/ui/internal/contentmergeviewer/text/EMFCompareTextMergeViewer.java

Line 196:
String leftValueFromWidget = new String(getContents(true), Charsets.UTF_8);
String rightValueFromWidget = new String(getContents(false), Charsets.UTF_8);

I assume you guys want to keep 1.5 compatibility. The bundles execution environment declares J2SE15... so I assume that constructor shouldn't be used.

Best Regards,
Víctor Roldán [Open Canarias]
Re: On EMF Compare 2.X scalability [message #1054854 is a reply to message #1051942] Wed, 08 May 2013 05:49 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Hi Cedric,

thanks for taking some time to look into this.
Quote:

I continued to work on the "search window" capability (not ready for general consumption yet), in the meantime I also introduced a cache for the distance computation.

http://git.eclipse.org/c/emfcompare/org.eclipse.emf.compare.git/commit/?id=6350fbb63dc60973510cd27068da73767b9ee014

It cuts my comparison times on decent to big models, could you have a try and tell me how it goes for you ?


I just checked out git HEAD, and the new org.eclipse.emf.compare.match.eobject.internal.CachingDistance seems to be missing in the repository :S I made sure the commit in synchronized with is the same you shared (which happens to be HEAD right now). Maybe you forgot to commit it too?

Quote:

Using the same feature filter for both the Match and Diff would be a better default IMO as long as one can have provide different ones when needed.

Regarding scalability I'm currently experimenting in containing the computation by using a SEARCH_WINDOW just like EMF compare 1.x. By doing so I have quite a few changes in the way big models are matched, I'm not sure yet if these changes are "normal" or if they are problematic and I'm going through them right now. In the meantime, if you want to give it a try, here is the modified ProximityMatcher

http://pastebin.com/cPZPE1gN

edit : SEARCH_WINDOW = 1500 works well for me from a matching quality perspective.


Ill check it and let you know, first I have to get rid of compilation errors Razz


Another thing, not really related, but may help you guys.
There seems to exist a JDK 1.6 usage in org.eclipse.emf.compare.ide.ui:

/org.eclipse.emf.compare.ide.ui/src/org/eclipse/emf/compare/ide/ui/internal/contentmergeviewer/text/EMFCompareTextMergeViewer.java

Line 196:
String leftValueFromWidget = new String(getContents(true), Charsets.UTF_8);
String rightValueFromWidget = new String(getContents(false), Charsets.UTF_8);


I assume you guys want to keep 1.5 compatibility. The bundles execution environment declares J2SE15... so I assume that constructor shouldn't be used.
Re: On EMF Compare 2.X scalability [message #1057498 is a reply to message #1053807] Wed, 08 May 2013 08:33 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Quote:
Using the same feature filter for both the Match and Diff would be a better default IMO as long as one can have provide different ones when needed.

Regarding scalability I'm currently experimenting in containing the computation by using a SEARCH_WINDOW just like EMF compare 1.x. By doing so I have quite a few changes in the way big models are matched, I'm not sure yet if these changes are "normal" or if they are problematic and I'm going through them right now. In the meantime, if you want to give it a try, here is the modified ProximityMatcher

http://pastebin.com/cPZPE1gN

edit : SEARCH_WINDOW = 1500 works well for me from a matching quality perspective.


Not having really better results Sad I left search window set to 300 and my test-cases do not actually finalize.
Re: On EMF Compare 2.X scalability [message #1057862 is a reply to message #1054854] Thu, 09 May 2013 08:04 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
You're right, I forgot to add this file. Pushed a fixing commit right now and triggered a build https://hudson.eclipse.org/hudson/job/emf-compare-master/547/

by the way, I'm gonna have to bring back chocolate at the office because of this "last minute commit before leaving".

[Updated on: Thu, 09 May 2013 08:05]

Report message to a moderator

Re: On EMF Compare 2.X scalability [message #1057864 is a reply to message #1057862] Thu, 09 May 2013 08:09 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
I was going back to our initial discussion : "[..] heuristic to reduce the size of storageToSearchFor by taking only elements with the same parent as the object subject to be matched. Not working perfectly anyway, but reduced the computation time."

Would you mind sharing it with us through a patch or something ?
Re: On EMF Compare 2.X scalability [message #1057870 is a reply to message #1057864] Thu, 09 May 2013 08:57 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Quote:

You're right, I forgot to add this file. Pushed a fixing commit right now and triggered a build https://hudson.eclipse.org/hudson/job/emf-compare-master/547/

Great! I'll check if theres any improvement here.

Quote:

by the way, I'm gonna have to bring back chocolate at the office because of this "last minute commit before leaving".


Good one! Is that strategy under EPL? I'd like to implement here at the office too Razz

Quote:

I was going back to our initial discussion : "[..] heuristic to reduce the size of storageToSearchFor by taking only elements with the same parent as the object subject to be matched. Not working perfectly anyway, but reduced the computation time."

Would you mind sharing it with us through a patch or something ?


I lost everything once I checked out from git repository Sad Anyway, the idea behind that naive change I made is some sort of Search Window strategy.

Anyway, given latest results with our big KDM models, and given our time constraints, we will keep using our previous code with EMF Compare 1.3, and the newer code will use 2.X stream, which compares over small models (and seems to work flawlessly, no false positives so far). I'll try to spend sometime in figuring out whats wrong with our KDM models to make 2.X not to scale.

Cheers,
Víctor.
Re: On EMF Compare 2.X scalability [message #1058098 is a reply to message #1051942] Sat, 11 May 2013 03:50 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
Didn't the cache improved your comparison times ?
Re: On EMF Compare 2.X scalability [message #1058232 is a reply to message #1058098] Mon, 13 May 2013 05:47 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Cedric,

Quote:
Didn't the cache improved your comparison times ?


I can't see substatial improvement with the new CachingDistance. The test-case is taking 889 seconds, and with the cache it takes 865 seconds.
Re: On EMF Compare 2.X scalability [message #1058294 is a reply to message #1054854] Mon, 13 May 2013 09:38 Go to previous messageGo to next message
Felix Velasco is currently offline Felix Velasco
Messages: 43
Registered: July 2009
Member
I'm afraid this comment from Victor went unnoticed. org.eclipse.emf.compare.ide.ui currently needs Java 1.6 in master.

Victor Roldan Betancort wrote on Wed, 08 May 2013 11:49
Another thing, not really related, but may help you guys.
There seems to exist a JDK 1.6 usage in org.eclipse.emf.compare.ide.ui:

/org.eclipse.emf.compare.ide.ui/src/org/eclipse/emf/compare/ide/ui/internal/contentmergeviewer/text/EMFCompareTextMergeViewer.java

Line 196:
String leftValueFromWidget = new String(getContents(true), Charsets.UTF_8);
String rightValueFromWidget = new String(getContents(false), Charsets.UTF_8);


I assume you guys want to keep 1.5 compatibility. The bundles execution environment declares J2SE15... so I assume that constructor shouldn't be used.
Re: On EMF Compare 2.X scalability [message #1058298 is a reply to message #1054854] Mon, 13 May 2013 09:44 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Victor Roldan Betancort wrote on Wed, 08 May 2013 05:49

Another thing, not really related, but may help you guys.
There seems to exist a JDK 1.6 usage in org.eclipse.emf.compare.ide.ui:

/org.eclipse.emf.compare.ide.ui/src/org/eclipse/emf/compare/ide/ui/internal/contentmergeviewer/text/EMFCompareTextMergeViewer.java

Line 196:
String leftValueFromWidget = new String(getContents(true), Charsets.UTF_8);
String rightValueFromWidget = new String(getContents(false), Charsets.UTF_8);


I assume you guys want to keep 1.5 compatibility. The bundles execution environment declares J2SE15... so I assume that constructor shouldn't be used.


What version are you using? I couldn't find any reference to that string constructor in our code base, and there does not seem to be any hit in the history of that class either.

Other than that, any chance you could provide us with the models you are seeing these performance issues with? As Cedric mentionned, we eventually plan on having something more flexible for the content match, but for now profiling would be a lot more help to track down what goes wrong in your case with our current implementation.

Laurent Goubet
Obeo
Re: On EMF Compare 2.X scalability [message #1058341 is a reply to message #1058298] Mon, 13 May 2013 11:37 Go to previous messageGo to next message
Felix Velasco is currently offline Felix Velasco
Messages: 43
Registered: July 2009
Member
The constructor is currently in master (see commit 4e3b45dcc6c)
Re: On EMF Compare 2.X scalability [message #1058444 is a reply to message #1058341] Tue, 14 May 2013 03:14 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Felix, Victor,

Yup, that was a mistake from my side (I forgot to "git pull" before checking ... came back from holidays yesterday Razz).

Java 6 usage is a mistake from another member of the Team, which has been fixed yesterday. This had been introduced post-M7, so only master was affected. As Cédric said in a previous comment, someone will have to bring chocolate to the office, that's the rule here since 2007 Wink.

Laurent Goubet
Obeo

[Updated on: Tue, 14 May 2013 03:14]

Report message to a moderator

Re: On EMF Compare 2.X scalability [message #1058818 is a reply to message #1058444] Wed, 15 May 2013 04:58 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Quote:

Yup, that was a mistake from my side (I forgot to "git pull" before checking ... came back from holidays yesterday ).

Java 6 usage is a mistake from another member of the Team, which has been fixed yesterday. This had been introduced post-M7, so only master was affected. As Cédric said in a previous comment, someone will have to bring chocolate to the office, that's the rule here since 2007.


Glad to see that fixed Smile

Quote:

Other than that, any chance you could provide us with the models you are seeing these performance issues with? As Cedric mentionned, we eventually plan on having something more flexible for the content match, but for now profiling would be a lot more help to track down what goes wrong in your case with our current implementation.


I wish I could, these models represent clients code, I dont think I have permission to spread clients code Razz
Re: On EMF Compare 2.X scalability [message #1058869 is a reply to message #1058818] Wed, 15 May 2013 09:25 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Quote:

I wish I could, these models represent clients code, I dont think I have permission to spread clients code


I wish we had a model obfuscator somewhere Smile. Shuffling randomly all primitive attributes would probably suffice to have an incomprehensible model that is still representative of a "real world" use case showing these issues?

Or if you can profile and send us a snapshot of the comparison time's profiling (yourkit does a really good job Smile http://www.yourkit.com/purchase/index.jsp)?

Laurent Goubet
Obeo
Re: On EMF Compare 2.X scalability [message #1059068 is a reply to message #1058869] Thu, 16 May 2013 09:54 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Laurent,

Quote:

I wish we had a model obfuscator somewhere . Shuffling randomly all primitive attributes would probably suffice to have an incomprehensible model that is still representative of a "real world" use case showing these issues?

Long time ago we developed theoric algorithms for such thing at Open Canarias, but never implemented. Obfuscation sure is an interesting topic in the modeling area.

Quote:

Or if you can profile and send us a snapshot of the comparison time's profiling (yourkit does a really good job http://www.yourkit.com/purchase/index.jsp)?


I actually used it during my analysis, but couldn't figure out anything, IIRC distance calculations, executed too many times, lead to a long computation times on our models.
I'll give it a try and send you guys an snapshot.

Cheers,
Víctor.
Re: On EMF Compare 2.X scalability [message #1059946 is a reply to message #1059068] Wed, 22 May 2013 06:00 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Victor,

I've started looking at the yourkit snapshot you provided us... but some things look strange in there. Most notably, I believe that one of the bottlenecks we've fixed is still present in this snapshot's traces. Are you using a recent build of EMF Compare 2.1, or are you still on EMF Compare 2.0? (the RC1 is available from http://download.eclipse.org/modeling/emf/compare/updates/milestones/2.1/S201305210744 ).

Laurent Goubet
Obeo
Re: On EMF Compare 2.X scalability [message #1059968 is a reply to message #1059946] Wed, 22 May 2013 07:51 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Laurent Goubet wrote on Wed, 22 May 2013 06:00
I believe that one of the bottlenecks we've fixed is still present in this snapshot's traces


Sorry about that. It is not the same bottleneck, though it is due to the same mis-use of the guava APIs (or so I think). We'll try and reproduce on our side, but we have a good lead now Smile.

Thanks for the snapshot, we'll keep you updated as soon as we confirm/fix the issue.

Laurent Goubet
Obeo
Re: On EMF Compare 2.X scalability [message #1059994 is a reply to message #1059968] Wed, 22 May 2013 09:22 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Laurent,

I've been synced to MASTER so it's probably not the same bottleneck. Hope the snapshot helps finding out something interesting Smile

Best Regards,
Víctor.
Re: On EMF Compare 2.X scalability [message #1072677 is a reply to message #1059994] Tue, 23 July 2013 05:05 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
Victor,

I just pushed a change for review which will help for the cases where the model grows beyond 10K elements.

(detailled in the commit message).

https://git.eclipse.org/r/#/c/14768/

Could you have a try ?
Re: On EMF Compare 2.X scalability [message #1072836 is a reply to message #1072677] Tue, 23 July 2013 11:58 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Hi Cedric,

I've tested this patch with our models. These are the results:

test1:
Size → 54733 elements
Tree Depth → 22 nodes
Maximum list size → 68 elements

Comparison time → 215 (great improvement)
Originally it did not finish (more than 2 hours)

test2:
Size → 200083 elements
Tree Depth → 48 nodes
Maximum list size → 133 elements

Comparison time → does not finish sucessfully (OOME)

java.lang.OutOfMemoryError: Java heap space
	at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:424)
	at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:406)
	at java.util.HashMap.put(HashMap.java:385)
	at java.util.HashSet.add(HashSet.java:200)
	at org.eclipse.emf.compare.match.eobject.EditionDistance$CountingDiffProcessor.attributeChange(EditionDistance.java:307)
	at org.eclipse.emf.compare.diff.DefaultDiffEngine.computeSingleValuedAttributeDifferences(DefaultDiffEngine.java:801)
	at org.eclipse.emf.compare.diff.DefaultDiffEngine.computeDifferences(DefaultDiffEngine.java:497)
	at org.eclipse.emf.compare.match.eobject.EditionDistance$CountingDiffEngine.computeDifferences(EditionDistance.java:382)
	at org.eclipse.emf.compare.diff.DefaultDiffEngine.checkForDifferences(DefaultDiffEngine.java:130)
	at org.eclipse.emf.compare.match.eobject.EditionDistance$CountingDiffEngine.measureDifferences(EditionDistance.java:427)
	at org.eclipse.emf.compare.match.eobject.EditionDistance.distance(EditionDistance.java:155)
	at org.eclipse.emf.compare.match.eobject.CachingDistance.distance(CachingDistance.java:57)
	at org.eclipse.emf.compare.match.eobject.internal.ProximityIndex.findTheClosest(ProximityIndex.java:182)
	at org.eclipse.emf.compare.match.eobject.internal.ProximityIndex.findClosests(ProximityIndex.java:110)
	at org.eclipse.emf.compare.match.eobject.internal.ByTypeIndex.findClosests(ByTypeIndex.java:86)
	at org.eclipse.emf.compare.match.eobject.ProximityEObjectMatcher.tryToMatch(ProximityEObjectMatcher.java:276)
	at org.eclipse.emf.compare.match.eobject.ProximityEObjectMatcher.matchList(ProximityEObjectMatcher.java:233)
	at org.eclipse.emf.compare.match.eobject.ProximityEObjectMatcher.matchAheadOfTime(ProximityEObjectMatcher.java:147)
	at org.eclipse.emf.compare.match.eobject.ProximityEObjectMatcher.createMatches(ProximityEObjectMatcher.java:115)
	at org.eclipse.emf.compare.match.eobject.IdentifierEObjectMatcher.doDelegation(IdentifierEObjectMatcher.java:153)
	at org.eclipse.emf.compare.match.eobject.IdentifierEObjectMatcher.createMatches(IdentifierEObjectMatcher.java:116)
	at org.eclipse.emf.compare.match.DefaultMatchEngine.match(DefaultMatchEngine.java:312)
	at org.eclipse.emf.compare.match.DefaultMatchEngine.match(DefaultMatchEngine.java:121)
	at org.eclipse.emf.compare.match.DefaultMatchEngine.match(DefaultMatchEngine.java:91)
	at org.eclipse.emf.compare.EMFCompare.compare(EMFCompare.java:169)
	at org.eclipse.emf.compare.EMFCompare.compare(EMFCompare.java:152)


I tested the above with 1024mb heap and 2048 heap, both failed with the same exception.

HTH,
Víctor.
Re: On EMF Compare 2.X scalability [message #1072854 is a reply to message #1072836] Tue, 23 July 2013 12:39 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
Thanks for your quick feedback.

I'll have a look at the memory consumption, I have not worked on it for the content match and there is no doubt there is room for enhancements. One "quick try" you could do would be to stop using the caching distance. It has no eviction strategy and is probably growing quite a lot.
Re: On EMF Compare 2.X scalability [message #1073233 is a reply to message #1072854] Wed, 24 July 2013 07:10 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric Brun
Messages: 359
Registered: July 2009
Senior Member
This change adds an eviction strategy for the caches : https://git.eclipse.org/r/#/c/14807/

How does it works for test2 ?

As a sidenote, do you have the measures of the same comparisons done with EMF Compare 1.x ?
Re: On EMF Compare 2.X scalability [message #1073357 is a reply to message #1073233] Wed, 24 July 2013 11:32 Go to previous messageGo to next message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Hi Cedric,

I tested the eviction strategies patch, still get the same OOME on test2.

With EMF Compare 1.X:

test1: 15 seconds
test2: 45 seconds
Re: On EMF Compare 2.X scalability [message #1074121 is a reply to message #1073357] Fri, 26 July 2013 02:49 Go to previous messageGo to next message
Laurent Goubet is currently offline Laurent Goubet
Messages: 1621
Registered: July 2009
Senior Member
Victor,

Is there still no means for you to provide us with your sample models? That would be a great help in determining what's going wrong here.

Laurent Goubet
Obeo
Re: On EMF Compare 2.X scalability [message #1074206 is a reply to message #1074121] Fri, 26 July 2013 05:29 Go to previous message
Victor Roldan Betancort is currently offline Victor Roldan Betancort
Messages: 524
Registered: July 2009
Senior Member
Laurent,

sure, we already have in the todo list to prepare test-cases with our models for you to reproduce the problem. I just wanted to provide some feedback on these recent changes, which seem promising.

Cheers,
Víctor.
Previous Topic:StructureMergeViewerFilter and EObject Adapter content
Next Topic:How to load an EMFText model in a standalone Java application
Goto Forum:
  


Current Time: Wed Jul 23 16:25:14 EDT 2014

Powered by FUDForum. Page generated in 0.12817 seconds