Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » ATL » [ATL] Invoking an ATL transformation on a part of a model, not on a whole model(ATL on part of a model)
[ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #883012] Thu, 07 June 2012 16:35 Go to next message
Stephane Vaucher is currently offline Stephane Vaucher
Messages: 3
Registered: June 2011
Junior Member
Hi everyone,

Reading the documentation, I see that "ATL module accepts a fixed number of models as input". That's great, but like a previous poster, I'd like to know if new versions of ATL can be configured to avoid loading the whole models before performing the transformations? Currently, I've got large models that contain a few million nodes and these will grow by a factor of 100 soon. The models are stored using CDO (which loads parts of the models on demand).

If I look at the ATL codebase, code like the getElementsByType(Object metaElement) is a performance killer:

for (Iterator<EObject> iterator = res.getAllContents(); iterator.hasNext() ; ) {
EObject element = iterator.next();
if (ec.isInstance(element)) {
ret.add(element);
}
...

I would appreciate any information/mechanism that can help me embed a more intelligent way to avoid loading the whole model. For your information, in CDO, it is not wise to load object by object as this requires every object to get serialized and deserialized individually.

Thanks,
Stephane Vaucher
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #883527 is a reply to message #883012] Fri, 08 June 2012 18:01 Go to previous messageGo to next message
Stephane Vaucher is currently offline Stephane Vaucher
Messages: 3
Registered: June 2011
Junior Member
FYI, to speed up some processing, I subclassed the EMFModel and redefined the getElementsByType method with an optimized version (that queries our database directly). It improves performance tremendously.
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #883943 is a reply to message #883527] Sat, 09 June 2012 19:20 Go to previous messageGo to next message
Dennis Wagelaar is currently offline Dennis Wagelaar
Messages: 147
Registered: July 2009
Senior Member
Op 08-06-12 20:01, Stephane Vaucher schreef:
> FYI, to speed up some processing, I subclassed the EMFModel and redefined the
> getElementsByType method with an optimized version (that queries our database
> directly). It improves performance tremendously.

Getting all instances of a single entity class is very efficient in relational
databases (just return the entire table), but typically not efficient in
standard EMF...

The problem is that EMF provides no dedicated API for retrieving all instances
of a certain EClass, and all you can do is iterate over the entire model.
Therefore, ATL has had to provide its own functionality to cache all instances
for each EClass.

If there is standard CDO API for retrieving all instances of a specific
EClass, I'd like to hear about it: it's indeed a must if ATL is ever to be
used for large data transformation.

Regards,
Dennis
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884110 is a reply to message #883943] Sun, 10 June 2012 05:49 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 26141
Registered: July 2009
Senior Member
Dennis,

Comments below.

On 09/06/2012 9:20 PM, Dennis Wagelaar wrote:
> Op 08-06-12 20:01, Stephane Vaucher schreef:
>> FYI, to speed up some processing, I subclassed the EMFModel and redefined the
>> getElementsByType method with an optimized version (that queries our database
>> directly). It improves performance tremendously.
> Getting all instances of a single entity class is very efficient in relational
> databases (just return the entire table), but typically not efficient in
> standard EMF...
It's an issue of a closed universe rather than an open one. When model
instances can be spread across the network, which is unbounded, no query
can ever know all instances. Anyone anywhere can create new ones...
>
> The problem is that EMF provides no dedicated API for retrieving all instances
> of a certain EClass, and all you can do is iterate over the entire model.
You need to iterate over the entire universe of data.
> Therefore, ATL has had to provide its own functionality to cache all instances
> for each EClass.
All instances in the workspace perhaps...
>
> If there is standard CDO API for retrieving all instances of a specific
> EClass, I'd like to hear about it: it's indeed a must if ATL is ever to be
> used for large data transformation.
I'm pretty sure CDO has such capabilities. I'm sure Eike will comment...
>
> Regards,
> Dennis
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884161 is a reply to message #884110] Sun, 10 June 2012 09:15 Go to previous messageGo to next message
Ed Willink is currently offline Ed Willink
Messages: 4097
Registered: July 2009
Senior Member
Hi
>> If there is standard CDO API for retrieving all instances of a specific
>> EClass, I'd like to hear about it: it's indeed a must if ATL is ever
>> to be
>> used for large data transformation.
> I'm pretty sure CDO has such capabilities. I'm sure Eike will comment...
CDO supports a variety of query languages, one of which is OCL, however
the OCL server-side support is fairly limited.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=340719.

There is ample opportunity for OCL's allInstances() to be supported
server-side by inherent CDO capabilities. If anyone wants to work on
this I can provide some assistance.

Regards

Ed Willink
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884289 is a reply to message #884110] Sun, 10 June 2012 18:24 Go to previous messageGo to next message
Dennis Wagelaar is currently offline Dennis Wagelaar
Messages: 147
Registered: July 2009
Senior Member
Op 10-06-12 07:49, Ed Merks schreef:
> Dennis,
>
> Comments below.
>
> On 09/06/2012 9:20 PM, Dennis Wagelaar wrote:
>> Op 08-06-12 20:01, Stephane Vaucher schreef:
>>> FYI, to speed up some processing, I subclassed the EMFModel and redefined the
>>> getElementsByType method with an optimized version (that queries our database
>>> directly). It improves performance tremendously.
>> Getting all instances of a single entity class is very efficient in relational
>> databases (just return the entire table), but typically not efficient in
>> standard EMF...
> It's an issue of a closed universe rather than an open one. When model
> instances can be spread across the network, which is unbounded, no query can
> ever know all instances. Anyone anywhere can create new ones...

Hmm, I thought EMF Resources provide a notion of locality/bounding? In any
case, ATL always bounds its allInstances() queries to one or more specific
Resources.

>>
>> The problem is that EMF provides no dedicated API for retrieving all instances
>> of a certain EClass, and all you can do is iterate over the entire model.
> You need to iterate over the entire universe of data.
>> Therefore, ATL has had to provide its own functionality to cache all instances
>> for each EClass.
> All instances in the workspace perhaps...
>>
>> If there is standard CDO API for retrieving all instances of a specific
>> EClass, I'd like to hear about it: it's indeed a must if ATL is ever to be
>> used for large data transformation.
> I'm pretty sure CDO has such capabilities. I'm sure Eike will comment...

I think I found it: CDOResource#queryExtent(EClass, boolean). I'm not sure how
much client/server communication happens between CDOResource and the actual
RDBMS, but the operation signature looks a lot like what Hibernate provides.
It even returns a Set instead of an EList (which Hibernate does to avoid an
ORDER BY expression in the SQL query). Am I on track here?

Dennis
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884309 is a reply to message #884161] Sun, 10 June 2012 19:14 Go to previous messageGo to next message
Dennis Wagelaar is currently offline Dennis Wagelaar
Messages: 147
Registered: July 2009
Senior Member
Op 10-06-12 11:15, Ed Willink schreef:
> Hi
>>> If there is standard CDO API for retrieving all instances of a specific
>>> EClass, I'd like to hear about it: it's indeed a must if ATL is ever to be
>>> used for large data transformation.
>> I'm pretty sure CDO has such capabilities. I'm sure Eike will comment...
> CDO supports a variety of query languages, one of which is OCL, however the
> OCL server-side support is fairly limited.
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=340719.
>
> There is ample opportunity for OCL's allInstances() to be supported
> server-side by inherent CDO capabilities. If anyone wants to work on this I
> can provide some assistance.
>
> Regards
>
> Ed Willink

The bug you mentioned does raise quite a few client/server technicalities,
including passing the OCL query to the server end. I suppose that ideally one
should try to translate as much OCL as possible into SQL, and execute that
instead of the OCL, such that we leverage the RDMS's query planner... That is
a rather ambitious project, however, and at least I won't be up for that at
the moment ;-).

To return to the original ATL situation: it's not only the embedded OCL that
poses the problem, but it already starts with the matched rule construct, e.g.:

rule AllClasses {
from s : UML!Class in SOURCE
to t : JAVA!Class (...)
}

Above rule will iterate the entire SOURCE model for UML Class instances. I
think Stephane already pointed to the most pragmatic solution: to modify ATL's
facility to find all instances of an EClass in a single Resource to directly
use CDO's API when possible.

Regards,
Dennis
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884427 is a reply to message #884289] Mon, 11 June 2012 03:49 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 26141
Registered: July 2009
Senior Member
Dennis,

Comments below.

On 10/06/2012 8:24 PM, Dennis Wagelaar wrote:
> Op 10-06-12 07:49, Ed Merks schreef:
>> Dennis,
>>
>> Comments below.
>>
>> On 09/06/2012 9:20 PM, Dennis Wagelaar wrote:
>>> Op 08-06-12 20:01, Stephane Vaucher schreef:
>>>> FYI, to speed up some processing, I subclassed the EMFModel and redefined the
>>>> getElementsByType method with an optimized version (that queries our database
>>>> directly). It improves performance tremendously.
>>> Getting all instances of a single entity class is very efficient in relational
>>> databases (just return the entire table), but typically not efficient in
>>> standard EMF...
>> It's an issue of a closed universe rather than an open one. When model
>> instances can be spread across the network, which is unbounded, no query can
>> ever know all instances. Anyone anywhere can create new ones...
> Hmm, I thought EMF Resources provide a notion of locality/bounding?
Yes, a resource contains a tree object EObject. But the set of all
possible resources isn't bounded so neither is the set of all EObjects.
> In any
> case, ATL always bounds its allInstances() queries to one or more specific
> Resources.
So if you know those are all in a data base, or a CDO repository, you
can make use of those bounds for optimization.

There was the start of an EMF Index project that would provide indexing
support that could be shared by Xtext and Query2, but that effort didn't
get very far because each project has their own approach.
>
>>> The problem is that EMF provides no dedicated API for retrieving all instances
>>> of a certain EClass, and all you can do is iterate over the entire model.
>> You need to iterate over the entire universe of data.
>>> Therefore, ATL has had to provide its own functionality to cache all instances
>>> for each EClass.
>> All instances in the workspace perhaps...
>>> If there is standard CDO API for retrieving all instances of a specific
>>> EClass, I'd like to hear about it: it's indeed a must if ATL is ever to be
>>> used for large data transformation.
>> I'm pretty sure CDO has such capabilities. I'm sure Eike will comment...
> I think I found it: CDOResource#queryExtent(EClass, boolean). I'm not sure how
> much client/server communication happens between CDOResource and the actual
> RDBMS, but the operation signature looks a lot like what Hibernate provides.
> It even returns a Set instead of an EList (which Hibernate does to avoid an
> ORDER BY expression in the SQL query). Am I on track here?
I'll prompt Eike to comment.
>
> Dennis
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #884434 is a reply to message #884289] Mon, 11 June 2012 04:22 Go to previous messageGo to next message
Eike Stepper is currently offline Eike Stepper
Messages: 5545
Registered: July 2009
Senior Member
Am 10.06.2012 20:24, schrieb Dennis Wagelaar:
> [...] I think I found it: CDOResource#queryExtent(EClass, boolean).
Took me a while to find this method because that was in CDO v0.7 in 2006. We're now at v4.1 and CDO has been completely
rewritten in the meantime. Please use a newer version from http://www.eclipse.org/cdo/downloads and have a look at our
OCL query handler:

/org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/OCLQueryHandler.java
/org.eclipse.emf.cdo.tests/src/org/eclipse/emf/cdo/tests/OCLQueryTest.java

If you're interested to develop an ATL query handler we could probably pull these classes into CDO core, as they don't
depend on OCL:

/org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/CDOExtentCreator.java
/org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/CDOExtentMap.java

> I'm not sure how much client/server communication happens between CDOResource and the actual RDBMS, but the operation
> signature looks a lot like what Hibernate provides. It even returns a Set instead of an EList (which Hibernate does to
> avoid an ORDER BY expression in the SQL query). Am I on track here?
CDO's QueryRequest sends the query and the query parameters (encapsulated by CDOQueryInfo) to the server. The response
is internally received as a stream that, higher up in the stack, is wrapped by an iterator if CDOQuery.getResultAsync()
was used. IIRC the response stream carries CDOIDs for the actual result objects. This enables to make use of the client
side object caches. There may be a query parameter that makes the actual result objects being streamed to the requesting
client, but you should be sure that most of the query results are not already cached on the client.

Cheers
/Eike

----
http://www.esc-net.de
http://thegordian.blogspot.com
http://twitter.com/eikestepper
Re: [ATL] Invoking an ATL transformation on a part of a model, not on a whole model [message #886377 is a reply to message #884434] Thu, 14 June 2012 20:19 Go to previous message
Dennis Wagelaar is currently offline Dennis Wagelaar
Messages: 147
Registered: July 2009
Senior Member
Op 11-06-12 06:22, Eike Stepper schreef:
> Am 10.06.2012 20:24, schrieb Dennis Wagelaar:
>> [...] I think I found it: CDOResource#queryExtent(EClass, boolean).
> Took me a while to find this method because that was in CDO v0.7 in 2006.
> We're now at v4.1 and CDO has been completely rewritten in the meantime.
> Please use a newer version from http://www.eclipse.org/cdo/downloads and have
> a look at our OCL query handler:
>
>
> /org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/OCLQueryHandler.java
>
> /org.eclipse.emf.cdo.tests/src/org/eclipse/emf/cdo/tests/OCLQueryTest.java
>
> If you're interested to develop an ATL query handler we could probably pull
> these classes into CDO core, as they don't depend on OCL:
>
>
> /org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/CDOExtentCreator.java
>
>
> /org.eclipse.emf.cdo.server.ocl/src/org/eclipse/emf/cdo/server/ocl/CDOExtentMap.java
>
>
>> I'm not sure how much client/server communication happens between
>> CDOResource and the actual RDBMS, but the operation signature looks a lot
>> like what Hibernate provides. It even returns a Set instead of an EList
>> (which Hibernate does to avoid an ORDER BY expression in the SQL query). Am
>> I on track here?
> CDO's QueryRequest sends the query and the query parameters (encapsulated by
> CDOQueryInfo) to the server. The response is internally received as a stream
> that, higher up in the stack, is wrapped by an iterator if
> CDOQuery.getResultAsync() was used. IIRC the response stream carries CDOIDs
> for the actual result objects. This enables to make use of the client side
> object caches. There may be a query parameter that makes the actual result
> objects being streamed to the requesting client, but you should be sure that
> most of the query results are not already cached on the client.
>
> Cheers
> /Eike
>

Thanks for the responses, Eike, Ed, and Erdal! I'll have a look into it one of
these days.

Cheers,
Dennis
Previous Topic:How to override keywords/built-in types in metamodel?
Next Topic:Sources for EMFTVM
Goto Forum:
  


Current Time: Sat Oct 25 11:15:41 GMT 2014

Powered by FUDForum. Page generated in 0.06717 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software