Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Archived » M2M (model-to-model transformation) » [QVTO] advice needed for large models
[QVTO] advice needed for large models [message #854804] Tue, 24 April 2012 08:42 Go to next message
Eclipse UserFriend
Hi all,

I'm using QVTO programmatically and have experienced that with larger
input data sets I easily run in to a hard "gc overhead limit reached"
error. I realize that this is also closely related to the transformation
script, but are there any general guidelines how to handle large data
sets? Are there any options or tweaks when configuring the
ExecutionContext so that it might create less runtime data? And what
happens if the model itself is too large to be all in memory at one
time? Is there any demand loading/unloading?

Any input or pointers on the matter will be greatly appreciated!

Thanks
Marius
Re: [QVTO] advice needed for large models [message #855092 is a reply to message #854804] Tue, 24 April 2012 14:10 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7680
Registered: July 2009
Senior Member
Hi

Assuming that you have done the obvious things like increase the Java
heap, disable tracing, and that you do not use allInstances() or
unnavigable opposites....

A complex transformation may need access to all the data, so I don't
think there is a general solution for QVTo.

In practice your transformation may be localized so that it is amenable
to streaming the model through in fragments. Assuming that QVTo does not
support this directly, you could have a stream reader that passes model
fragments for local transformation, and then have a stream writer that
combines the result fragments.

Alternatively you might contrive to keep the model in a repository such
as CDO so that you only need a small portion in memory at any time.

One day, a declarative transformation language, such as QVTr, could have
streaming operation as one of its compilation strategies.

Regards

Ed Willink

On 24/04/2012 09:42, Marius Gröger wrote:
> Hi all,
>
> I'm using QVTO programmatically and have experienced that with larger
> input data sets I easily run in to a hard "gc overhead limit reached"
> error. I realize that this is also closely related to the transformation
> script, but are there any general guidelines how to handle large data
> sets? Are there any options or tweaks when configuring the
> ExecutionContext so that it might create less runtime data? And what
> happens if the model itself is too large to be all in memory at one
> time? Is there any demand loading/unloading?
>
> Any input or pointers on the matter will be greatly appreciated!
>
> Thanks
> Marius
Re: [QVTO/ATL] advice needed for large models [message #855132 is a reply to message #855092] Tue, 24 April 2012 14:45 Go to previous messageGo to next message
Eclipse UserFriend
On 24.04.2012 16:10, Ed Willink wrote:
> Assuming that you have done the obvious things like increase the Java
> heap, disable tracing, and that you do not use allInstances() or
> unnavigable opposites....

Thanks for answering. How do I disable tracing? Before posting, I had
done some research on this using google and the source code but found no
evidence that this is possible at all. Perhaps you could throw me a
pointer how to do that?

>
> A complex transformation may need access to all the data, so I don't
> think there is a general solution for QVTo.

Ok... I had chosen QVT over ATL after some evaluation because I found
that QVT has much better preparation for the programmatic usage which I
need. So, how does ATL handle large datasets?

Thanks
Marius
Re: [QVTO/ATL] advice needed for large models [message #855189 is a reply to message #855132] Tue, 24 April 2012 15:40 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7680
Registered: July 2009
Senior Member
Hi

Just leave the trace file blank in the launcher; I'm only guessing that
this helps.

I would not expect significant difference between ATL or QVTo for large
datasets. Both probably assume that the entire model is in memory.

Regards

Ed Willink

On 24/04/2012 15:45, Marius Gröger wrote:
> On 24.04.2012 16:10, Ed Willink wrote:
>> Assuming that you have done the obvious things like increase the Java
>> heap, disable tracing, and that you do not use allInstances() or
>> unnavigable opposites....
> Thanks for answering. How do I disable tracing? Before posting, I had
> done some research on this using google and the source code but found no
> evidence that this is possible at all. Perhaps you could throw me a
> pointer how to do that?
>
>> A complex transformation may need access to all the data, so I don't
>> think there is a general solution for QVTo.
> Ok... I had chosen QVT over ATL after some evaluation because I found
> that QVT has much better preparation for the programmatic usage which I
> need. So, how does ATL handle large datasets?
>
> Thanks
> Marius
Re: [QVTO] advice needed for large models [message #855550 is a reply to message #854804] Tue, 24 April 2012 23:10 Go to previous messageGo to next message
Alan McMorran is currently offline Alan McMorranFriend
Messages: 55
Registered: July 2009
Member
Marius,

How large are you talking about? I've found QVTo scales pretty well as
the dataset size increases but we're using at most maybe 3-4 million
objects as the input and maybe 1-2 million on the output. They can be
pretty complex models though so we're seeing 8GB heap spaces in some
cases to accomodate the full transformation process.

The big challenges we've had to overcome is that our model is
essentially flat with no containment in it so there are parts of the
QVTo runtime that tend to assume only a few root nodes and we had to
make some modifications to cut the post-transform processing down.

Is the GC overhead limit not tied to the heap space limits of the JVM?

Alan


> Hi all,
>
> I'm using QVTO programmatically and have experienced that with larger
> input data sets I easily run in to a hard "gc overhead limit reached"
> error. I realize that this is also closely related to the transformation
> script, but are there any general guidelines how to handle large data
> sets? Are there any options or tweaks when configuring the
> ExecutionContext so that it might create less runtime data? And what
> happens if the model itself is too large to be all in memory at one
> time? Is there any demand loading/unloading?
>
> Any input or pointers on the matter will be greatly appreciated!
>
> Thanks
> Marius
Re: [QVTO/ATL] advice needed for large models [message #855831 is a reply to message #855189] Wed, 25 April 2012 06:20 Go to previous messageGo to next message
Eclipse UserFriend
On 24.04.2012 17:40, Ed Willink wrote:
> Just leave the trace file blank in the launcher; I'm only guessing that
> this helps.

Hm... as I said I'm using QVT programmatically within an application,
and not using a launcher. From what I saw in the source code it appeared
to me that you can't disable tracing at all. It even seemed to me that
QVT's regular operation relies on those traces. It's just an option to
save them or not.

I may be very wrong here, so please correct me then.

Regards
Marius
Re: [QVTO] advice needed for large models [message #855849 is a reply to message #855550] Wed, 25 April 2012 06:36 Go to previous messageGo to next message
Eclipse UserFriend
On 25.04.2012 01:10, Alan McMorran wrote:
> How large are you talking about? I've found QVTo scales pretty well as
> the dataset size increases but we're using at most maybe 3-4 million
> objects as the input and maybe 1-2 million on the output. They can be
> pretty complex models though so we're seeing 8GB heap spaces in some
> cases to accomodate the full transformation process.

Ok, that is good to know. We will be working in roughly the same order
of magnitude. The final application will run on a well equipped server,
unfortunately my development machine is not as powerful so I can't
really test that.

> The big challenges we've had to overcome is that our model is
> essentially flat with no containment in it so there are parts of the

We have a very hierarchical model. I still wonder to what extent EMF and
QVTo at least try to let go of objects which are not needed anymore and
allow them to be garbage collected?

> Is the GC overhead limit not tied to the heap space limits of the JVM?

Apparently not, quoting
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html:

"The concurrent collector will throw an OutOfMemoryError if too much
time is being spent in garbage collection: if more than 98% of the total
time is spent in garbage collection and less than 2% of the heap is
recovered, an OutOfMemoryError will be thrown. This feature is designed
to prevent applications from running for an extended period of time
while making little or no progress because the heap is too small. If
necessary, this feature can be disabled by adding the option
-XX:-UseGCOverheadLimit to the command line."

I will experiment a little bit with different GC's, namely the parallel GC.

Regards
Marius
Re: [QVTO] advice needed for large models [message #855945 is a reply to message #855849] Wed, 25 April 2012 08:16 Go to previous message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7680
Registered: July 2009
Senior Member
Hi

More appropriate is to wonder why garbage collection should be able to
recover anything anyway.

If you're loading a model, transforming then saving, then its not until
the save is complete and the ResourceSet is emotied that garbage
collection might start.

EMF models have many references with additional references buried in
adapters. It is a non-trivial activity to isolate some part of an EMF
Resource so that it can be collected.

I suggest creating a small test model and using VisualVM to debug why
garbage collection fails at the point where you think it could succeed.

Regards

Ed Willink

On 25/04/2012 07:36, Marius Gröger wrote:
> We have a very hierarchical model. I still wonder to what extent EMF
> and QVTo at least try to let go of objects which are not needed
> anymore and allow them to be garbage collected?
Previous Topic:[ATL] Copy sequences in Refining Mode
Next Topic:[QVTo] UnitResolverFactory weiredness
Goto Forum:
  


Current Time: Wed Dec 11 11:02:19 GMT 2024

Powered by FUDForum. Page generated in 0.04302 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top