Eclipse Community Forums Forum Search:

Home » Modeling » Compare » On suboptimal decisions over the match solution spectrum
On suboptimal decisions over the match solution spectrum Fri, 19 July 2013 13:45
 Victor Roldan BetancortMessages: 524Registered: July 2009 Senior Member
Hi folks,

been lately working on improving the accuracy of the EMF Compare proximity engine. I've found the current implementation falls into suboptimal solutions at the matching stage. My point is to discuss why I think this is happening and if the EMF Compare team agrees it could happen. I'd be happy to be proven wrong, but I've debugged several scenarios in our test-cases that proves this is actually happening.

When choosing the best match for a given EObject, the proximity engine matches a pair with the shortest distance, where distance is a numeric value that determines how similar are two objects. This can be found at ProximityIndex.findTheClosest(Comparison, EObject, Side, Side, boolean).

This approach leads to a suboptimal set of matches, because the distance is optimized locally (per EObject) instead of globally (the smallest overall distance for a set of match decisions).

I'll try to describe an example, we have 3 elements of the same type A, B and C, and their counterparts in the "other side" A', B' and C'. I describe the combination of matches and their distances. The less distance points, the better.

A -> A' = 8 points
A -> B' = 5 points
A -> C' = 10 points

B -> A' = 10 points
B -> B' = 1 points
B -> C' = 7 points

C -> A' = 7 points
C -> B' = 5 points
C -> C' = 3 points

Currently the engine would explore the match solution space by iterating all combinations in an arbitrary order (I believe in order of the EObjects in their containment feature).

If we choose in order, we would be lead to the following match solution:

A -> B'
B -> C'
C -> A'

which gives a global distance of 5 + 7 + 7 = 19. Since A matches B', B' gets removed from the match space, so B can only match A' or C'. Finally C matches A'.

This is a simplification of what actually happens. EMF Compare introduces the "double check" concept, which basically does is "if B' is the best match for A, check if A is the best match for B'". Matches are not commutative: if B' is the best match for A, it does not imply A is the best match for B'. I wont go through the result of the double check logic in the ProximityIndex class for the above example scenario, but I can say it would actually lead to an even worst results: the engine will decide A has no proper match in the solution space because no double check succeeds.

If instead we choose the following match combination:

A -> A'
B -> B'
C -> C'

we would obtain a global distance of 8 + 1 + 3 = 12. This is actually the correct match combination and also the lowest global distance, which means it is the optimal match solution over all possible combinations.

Hope I've been clear. I think the implementation of a global distance optimization algorithm would improve the engine accuracy, but is not trivial. A combinatory exploration algorithm, searching for the best option, is not a feasible approach since it would be computationally unacceptable (here we have 3 x 3 match combinations, think of a match space with 1.000.000 elements). Therefore, some optimization heuristic would be more desirable, also having the problem that it may lead to suboptimal decisions, but would execute faster.

Do you guys think the above conclusion is logical / coherent, or did I make some wrong assumption? Granted, were you aware of this situation? Any plans on this regard in case you were aware?

Best Regards,
Víctor Roldán [Open Canarias]

[Updated on: Fri, 19 July 2013 14:50]

Report message to a moderator

Re: On suboptimal decisions over the match solution spectrum [message #1072380 is a reply to message #1071171] Mon, 22 July 2013 16:37
 Cedric BrunMessages: 431Registered: July 2009 Senior Member
Hi Victor,

Your analysis is right, the matcher minimize local distances between one EObject and the other, not the global (sum of all the distances) of the whole match model.

The double check is here to make sure we won't "consume" B' by matching it with A if the closest to B' is, for instance, C'.

This approach is in general "good enough" based on our experience with EMF compare 1.x, but the research field is wide and other approach could be better. So no plans for us to address this yet except if that means being faster and scaling better.

Here is an alternative it might be worth experimenting with : "graph similarity flooding"

http://ilpubs.stanford.edu:8090/730/1/2002-1.pdf

http://cedric.brun.io news and articles on eclipse and eclipse modeling.
Re: On suboptimal decisions over the match solution spectrum [message #1072839 is a reply to message #1072380] Tue, 23 July 2013 16:02
 Victor Roldan BetancortMessages: 524Registered: July 2009 Senior Member
Cedric,

thanks for your input. The current heuristic seems fair enough for many scenarios. However, depending on the application, suboptimal match sets may not be desirable.
I tested a customization that looks for the best match by testing the whole spectrum and seems to return better results, but it simply does not scale.

Thanks for confirming this. I'll consider if its worth implement some other heuristic.

Best Regards,
Víctor.
 Previous Topic: About the next EMF Compare release plans Next Topic: StructureMergeViewerFilter and EObject Adapter content
Goto Forum:

Current Time: Tue Sep 28 14:55:31 GMT 2021

Powered by FUDForum. Page generated in 0.05638 seconds
.:: Contact :: Home ::.