Eclipse Community Forums: Compare » [EMF Compare] Compute degree of similarity

Help

Home

Home » Modeling » Compare » [EMF Compare] Compute degree of similarity

Show: Today's Messages :: Show Polls :: Message Navigator

[EMF Compare] Compute degree of similarity [message #1037148]

Tue, 09 April 2013 08:10

Aleksandar Toshovski

Messages: 78
Registered: December 2011

Member

I want to compare a model with another models and sort them by degree of similarity. How can I get the degree of similarity between two models? I know when there are no IDs the calculation works with a degree of similarity. But how can I get its value?

[Updated on: Tue, 09 April 2013 08:14]

Report message to a moderator

Re: [EMF Compare] Compute degree of similarity [message #1038002 is a reply to message #1037148]

Wed, 10 April 2013 09:18

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Hi,

EMF Compare does not compute a degree of similarity between two models; it constructs a list of matching elements between the two models, and a list of differences between these elements. Under the hood, a degree of similarity is calculated at the EObject level in order to match them together... but that really is an "EObject level" thing. I do not think this metric makes sense at the "model" level.

You could compare it to computing a degree of similarity between two characters versus defining the similarity between two Strings. 'k' is somewhat far from 's', that could be measured by the number of other letters between them, and such a metric makes sense. But how close is "kitten" from "sitting"? The sum of the difference between individual characters does not add up to a metric that really makes sense. Instead, we compute an edition distance such as Levenshtein's.

If what you need is the similarity between two EObjects, then yes, we do compute such a metric. You may want to look into our org.eclipse.emf.compare.match.eobject.EditionDistance and how we use it (a good place to start is the associated test).

You should still think about what you really need, and what a good definition of the "similarity" between two "models" really is for you. As mentionned above, the sum of distances between all of the model's objects isn't really a convincing measure of that similarity. A better measure would probably include the number of differences, distance between the containing resources' names, number of referenced resources and their own similarities...

Laurent Goubet
Obeo

Report message to a moderator

Re: [EMF Compare] Compute degree of similarity [message #1041542 is a reply to message #1038002]

Mon, 15 April 2013 08:23

Aleksandar Toshovski

Messages: 78
Registered: December 2011

Member

Thanks alot for your answer.

In the Paper from Xing and Sroilia "UMLDiff: An Algorithm for Object-Oriented Design Differencing" there are the similarity metrics described. What I maybe need is Structure similarity and/or Overal similarity metric. I read somewhere that EMF Compare is based on this paper and thought, that I can somehow retrieve these metrics.

A better measure would probably include the number of differences, distance between the containing resources' names, number of referenced resources and their own similarities.

What do you mean with: "distance between the containing resources' names" ?

[Updated on: Mon, 15 April 2013 08:24]

Report message to a moderator

Re: [EMF Compare] Compute degree of similarity [message #1047417 is a reply to message #1041542]

Tue, 23 April 2013 07:19

Laurent Goubet

Messages: 1902
Registered: July 2009

Senior Member

Aleksandar,

The first version of EMF Compare was indeed loosely based on this paper and other related works. This approach has been dropped in EMF Compare 2 in favor of a "distance" computation. As mentionned, I do not think this distance has any meaning on a level higher than EObject.

Quote:

What do you mean with: "distance between the containing resources' names" ?

You may have a model composed of files "library.uml" and "types.uml". A second version of this model is still named library.uml, it uses types from a renamed "library_types.uml", and it makes use of stereotypes defined in "library_profile.uml". A type that was previously in "types.uml", call it "Date", has been renamed to "Calendar". A stereotype, "Borrowable" that was previously in types.uml, has been moved to library_profile.uml and renamed to "Lendable". Aside from Borrowable, types.uml (and, as such, library_types.uml too) also contains a type named "Borrower".

How do you compute a similarity between types.uml#Date and library_types.uml#Calendar so that they are defined to be a Match (i.e. "highest similarity")? How do you define a similarity between types.uml#Borrowable and library_profile#Lendable so that this similarity is higher than that of types.uml#Borrowable and library_types.uml#Borrower?

Of course, with UML, the question isn't really interesting : elements have an ID that make them much easier to match than usual. However, the same question could be asked with Ecore. There are a good amount of variables to take into account, and I believe that the containing resource also has a weight.

Laurent Goubet
Obeo

[Updated on: Tue, 23 April 2013 09:02]

Report message to a moderator