Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » Compare » On suboptimal decisions over the match solution spectrum
On suboptimal decisions over the match solution spectrum [message #1071171] Fri, 19 July 2013 13:45 Go to next message
Victor Roldan Betancort is currently offline Victor Roldan BetancortFriend
Messages: 524
Registered: July 2009
Senior Member
Hi folks,

been lately working on improving the accuracy of the EMF Compare proximity engine. I've found the current implementation falls into suboptimal solutions at the matching stage. My point is to discuss why I think this is happening and if the EMF Compare team agrees it could happen. I'd be happy to be proven wrong, but I've debugged several scenarios in our test-cases that proves this is actually happening.

When choosing the best match for a given EObject, the proximity engine matches a pair with the shortest distance, where distance is a numeric value that determines how similar are two objects. This can be found at ProximityIndex.findTheClosest(Comparison, EObject, Side, Side, boolean).

This approach leads to a suboptimal set of matches, because the distance is optimized locally (per EObject) instead of globally (the smallest overall distance for a set of match decisions).

I'll try to describe an example, we have 3 elements of the same type A, B and C, and their counterparts in the "other side" A', B' and C'. I describe the combination of matches and their distances. The less distance points, the better.

A -> A' = 8 points
A -> B' = 5 points
A -> C' = 10 points

B -> A' = 10 points
B -> B' = 1 points
B -> C' = 7 points

C -> A' = 7 points
C -> B' = 5 points
C -> C' = 3 points

Currently the engine would explore the match solution space by iterating all combinations in an arbitrary order (I believe in order of the EObjects in their containment feature).

If we choose in order, we would be lead to the following match solution:

A -> B'
B -> C'
C -> A'

which gives a global distance of 5 + 7 + 7 = 19. Since A matches B', B' gets removed from the match space, so B can only match A' or C'. Finally C matches A'.

This is a simplification of what actually happens. EMF Compare introduces the "double check" concept, which basically does is "if B' is the best match for A, check if A is the best match for B'". Matches are not commutative: if B' is the best match for A, it does not imply A is the best match for B'. I wont go through the result of the double check logic in the ProximityIndex class for the above example scenario, but I can say it would actually lead to an even worst results: the engine will decide A has no proper match in the solution space because no double check succeeds.

If instead we choose the following match combination:

A -> A'
B -> B'
C -> C'

we would obtain a global distance of 8 + 1 + 3 = 12. This is actually the correct match combination and also the lowest global distance, which means it is the optimal match solution over all possible combinations.

Hope I've been clear. I think the implementation of a global distance optimization algorithm would improve the engine accuracy, but is not trivial. A combinatory exploration algorithm, searching for the best option, is not a feasible approach since it would be computationally unacceptable (here we have 3 x 3 match combinations, think of a match space with 1.000.000 elements). Therefore, some optimization heuristic would be more desirable, also having the problem that it may lead to suboptimal decisions, but would execute faster.

Do you guys think the above conclusion is logical / coherent, or did I make some wrong assumption? Granted, were you aware of this situation? Any plans on this regard in case you were aware?

Best Regards,
Víctor Roldán [Open Canarias]

[Updated on: Fri, 19 July 2013 14:50]

Report message to a moderator

Re: On suboptimal decisions over the match solution spectrum [message #1072380 is a reply to message #1071171] Mon, 22 July 2013 16:37 Go to previous messageGo to next message
Cedric Brun is currently offline Cedric BrunFriend
Messages: 373
Registered: July 2009
Senior Member
Hi Victor,

Your analysis is right, the matcher minimize local distances between one EObject and the other, not the global (sum of all the distances) of the whole match model.

The double check is here to make sure we won't "consume" B' by matching it with A if the closest to B' is, for instance, C'.

This approach is in general "good enough" based on our experience with EMF compare 1.x, but the research field is wide and other approach could be better. So no plans for us to address this yet except if that means being faster and scaling better.

Here is an alternative it might be worth experimenting with : "graph similarity flooding"

http://ilpubs.stanford.edu:8090/730/1/2002-1.pdf

Re: On suboptimal decisions over the match solution spectrum [message #1072839 is a reply to message #1072380] Tue, 23 July 2013 16:02 Go to previous message
Victor Roldan Betancort is currently offline Victor Roldan BetancortFriend
Messages: 524
Registered: July 2009
Senior Member
Cedric,

thanks for your input. The current heuristic seems fair enough for many scenarios. However, depending on the application, suboptimal match sets may not be desirable.
I tested a customization that looks for the best match by testing the whole spectrum and seems to return better results, but it simply does not scale.

Thanks for confirming this. I'll consider if its worth implement some other heuristic.

Best Regards,
Víctor.
Previous Topic:EMF Compare get Object Attributes
Next Topic:StructureMergeViewerFilter and EObject Adapter content
Goto Forum:
  


Current Time: Thu Nov 20 21:02:03 GMT 2014

Powered by FUDForum. Page generated in 0.01751 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software