|
Re: [EMF Compare] Compute degree of similarity [message #1038002 is a reply to message #1037148] |
Wed, 10 April 2013 09:18 |
|
Hi,
EMF Compare does not compute a degree of similarity between two models; it constructs a list of matching elements between the two models, and a list of differences between these elements. Under the hood, a degree of similarity is calculated at the EObject level in order to match them together... but that really is an "EObject level" thing. I do not think this metric makes sense at the "model" level.
You could compare it to computing a degree of similarity between two characters versus defining the similarity between two Strings. 'k' is somewhat far from 's', that could be measured by the number of other letters between them, and such a metric makes sense. But how close is "kitten" from "sitting"? The sum of the difference between individual characters does not add up to a metric that really makes sense. Instead, we compute an edition distance such as Levenshtein's.
If what you need is the similarity between two EObjects, then yes, we do compute such a metric. You may want to look into our org.eclipse.emf.compare.match.eobject.EditionDistance and how we use it (a good place to start is the associated test).
You should still think about what you really need, and what a good definition of the "similarity" between two "models" really is for you. As mentionned above, the sum of distances between all of the model's objects isn't really a convincing measure of that similarity. A better measure would probably include the number of differences, distance between the containing resources' names, number of referenced resources and their own similarities...
Laurent Goubet
Obeo
|
|
|
|
Re: [EMF Compare] Compute degree of similarity [message #1047417 is a reply to message #1041542] |
Tue, 23 April 2013 07:19 |
|
Aleksandar,
The first version of EMF Compare was indeed loosely based on this paper and other related works. This approach has been dropped in EMF Compare 2 in favor of a "distance" computation. As mentionned, I do not think this distance has any meaning on a level higher than EObject.
Quote:
What do you mean with: "distance between the containing resources' names" ?
You may have a model composed of files "library.uml" and "types.uml". A second version of this model is still named library.uml, it uses types from a renamed "library_types.uml", and it makes use of stereotypes defined in "library_profile.uml". A type that was previously in "types.uml", call it "Date", has been renamed to "Calendar". A stereotype, "Borrowable" that was previously in types.uml, has been moved to library_profile.uml and renamed to "Lendable". Aside from Borrowable, types.uml (and, as such, library_types.uml too) also contains a type named "Borrower".
How do you compute a similarity between types.uml#Date and library_types.uml#Calendar so that they are defined to be a Match (i.e. "highest similarity")? How do you define a similarity between types.uml#Borrowable and library_profile#Lendable so that this similarity is higher than that of types.uml#Borrowable and library_types.uml#Borrower?
Of course, with UML, the question isn't really interesting : elements have an ID that make them much easier to match than usual. However, the same question could be asked with Ecore. There are a good amount of variables to take into account, and I believe that the containing resource also has a weight.
Laurent Goubet
Obeo
[Updated on: Tue, 23 April 2013 09:02] Report message to a moderator
|
|
|
|
Powered by
FUDForum. Page generated in 0.03160 seconds