Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » Epsilon » ETL Performance?
ETL Performance? [message #483582] Wed, 02 September 2009 08:31 Go to next message
Eclipse UserFriend
Originally posted by: d.clowes.lboro.ac.uk

Hi All,

Does anyone have any suggestions on improving the performance of ETL
scripts.

I am trying to transform the raw XML data into our models. The data is
split across several files totaling approx 40mb. So far the transformation
has being running for over 26hours and is only approx 8mb through. (The
current file is 9mb and has taken 24hours on its own so far.)

It also looks as if Eclipse will not maximise the processor usage. I have
a Dual Core processor but both processors only seem to be running at 50%
whilst it is processing the Transform.

I'm reluctant to stop the process even though it looks like taking a week
as I don't want to havve to restart it if there are no known performance
tweeks.

Thanks for any suggestions,

Darren
Re: ETL Performance? [message #483588 is a reply to message #483582] Wed, 02 September 2009 09:11 Go to previous messageGo to next message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Please see comments below

Darren Clowes wrote:
> Hi All,
>
> Does anyone have any suggestions on improving the performance of ETL
> scripts.
>
> I am trying to transform the raw XML data into our models. The data is
> split across several files totaling approx 40mb. So far the
> transformation has being running for over 26hours and is only approx 8mb
> through. (The current file is 9mb and has taken 24hours on its own so far.)

That sounds like a long time for this size of models.

>
> It also looks as if Eclipse will not maximise the processor usage. I
> have a Dual Core processor but both processors only seem to be running
> at 50% whilst it is processing the Transform.

I must admit I have no idea whether that's a Java or an Eclipse issue...

>
> I'm reluctant to stop the process even though it looks like taking a
> week as I don't want to havve to restart it if there are no known
> performance tweeks.
>

As general tips, I'd suggest minimizing the number of calls to
equivalent() where possible, reducing the number of Native objects by
creating them in the pre section (instead of creating one in each rule
invocation), and using @cached operations for caching the results of
repetitive complex calculations.

I'll have to look at the transformation/models for more concrete
feedback; I'll email you separately on that.

> Thanks for any suggestions,
>
> Darren
>
Cheers,
Dimitris
Re: ETL Performance? [message #483604 is a reply to message #483588] Wed, 02 September 2009 09:56 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: d.clowes.lboro.ac.uk

Thanks Dimitris I shall probably give those a try as I know I have a
native object in a deep loop at some point.

With regards to equivalent() I also make heavy use of this. Are you
suggesting rather than call equivalent, I should create the new object
within the parent rule?

i.e. rather than:

for (c in t1.cell){
t2.cells.add(c.equivalent());
}

DO:
for (c in t1.cell){
var x : Cell;
x.text := c.text;
t2.cells.add(x);
}

Thanks Darren
Re: ETL Performance? [message #483605 is a reply to message #483604] Wed, 02 September 2009 10:05 Go to previous messageGo to next message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Darren Clowes wrote:
> Thanks Dimitris I shall probably give those a try as I know I have a
> native object in a deep loop at some point.

Creating this once in the pre{} instead should speed things up.

>
> With regards to equivalent() I also make heavy use of this. Are you
> suggesting rather than call equivalent, I should create the new object
> within the parent rule?
>
> i.e. rather than:
>
> for (c in t1.cell){
> t2.cells.add(c.equivalent());
> }
> DO:
> for (c in t1.cell){
> var x : Cell;
> x.text := c.text;
> t2.cells.add(x);
> }

Yes. For such simple cases it's probably an overkill to define a
separate rule.

> Thanks Darren
>

Cheers,
Dimitris

--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Re: ETL Performance? [message #483622 is a reply to message #483605] Wed, 02 September 2009 11:31 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: d.clowes.lboro.ac.uk

Thanks Dimitris,

Your suggestion have improved performance significantly. I'm working on a
slower machine at the moment but what took 6hours now takes approx 30mins
on this slower machine. It still looks like it will take a few hours to
complete but that is much better than a few days :D

Darren
Re: ETL Performance? [message #483623 is a reply to message #483622] Wed, 02 September 2009 11:35 Go to previous messageGo to next message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Glad this helped! I've generated a 60MB model and I'm profiling the ETL
engine to see if there are bits we can also improve internally.

Cheers,
Dimitris

Darren Clowes wrote:
> Thanks Dimitris,
>
> Your suggestion have improved performance significantly. I'm working on
> a slower machine at the moment but what took 6hours now takes approx
> 30mins on this slower machine. It still looks like it will take a few
> hours to complete but that is much better than a few days :D
>
> Darren
>


--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Re: ETL Performance? [message #483653 is a reply to message #483623] Wed, 02 September 2009 13:27 Go to previous message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

On an update to this, equivalent resolution seems to be the main
bottleneck indeed. Rewriting a sample ETL transformation into plain
procedural EOL seems to be reducing its execution time by almost 90%.
For a sample 2MB input model the ETL transformation takes 191 sec while
the EOL one only 18. The same EOL transformation transforms a 60MB input
model in under 7 mins.

There seems to be quite a bit of space for improvement in the equivalent
resolution of ETL. I've opened
https://bugs.eclipse.org/bugs/show_bug.cgi?id=288355 to keep track of this.

Cheers,
Dimitris

Dimitris Kolovos wrote:
> Hi Darren,
>
> Glad this helped! I've generated a 60MB model and I'm profiling the ETL
> engine to see if there are bits we can also improve internally.
>
> Cheers,
> Dimitris
>
> Darren Clowes wrote:
>> Thanks Dimitris,
>>
>> Your suggestion have improved performance significantly. I'm working
>> on a slower machine at the moment but what took 6hours now takes
>> approx 30mins on this slower machine. It still looks like it will take
>> a few hours to complete but that is much better than a few days :D
>>
>> Darren
>>
>
>


--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Re: ETL Performance? [message #580041 is a reply to message #483582] Wed, 02 September 2009 09:11 Go to previous message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Please see comments below

Darren Clowes wrote:
> Hi All,
>
> Does anyone have any suggestions on improving the performance of ETL
> scripts.
>
> I am trying to transform the raw XML data into our models. The data is
> split across several files totaling approx 40mb. So far the
> transformation has being running for over 26hours and is only approx 8mb
> through. (The current file is 9mb and has taken 24hours on its own so far.)

That sounds like a long time for this size of models.

>
> It also looks as if Eclipse will not maximise the processor usage. I
> have a Dual Core processor but both processors only seem to be running
> at 50% whilst it is processing the Transform.

I must admit I have no idea whether that's a Java or an Eclipse issue...

>
> I'm reluctant to stop the process even though it looks like taking a
> week as I don't want to havve to restart it if there are no known
> performance tweeks.
>

As general tips, I'd suggest minimizing the number of calls to
equivalent() where possible, reducing the number of Native objects by
creating them in the pre section (instead of creating one in each rule
invocation), and using @cached operations for caching the results of
repetitive complex calculations.

I'll have to look at the transformation/models for more concrete
feedback; I'll email you separately on that.

> Thanks for any suggestions,
>
> Darren
>
Cheers,
Dimitris
Re: ETL Performance? [message #580054 is a reply to message #483588] Wed, 02 September 2009 09:56 Go to previous message
Darren  is currently offline Darren Friend
Messages: 40
Registered: September 2009
Member
Thanks Dimitris I shall probably give those a try as I know I have a
native object in a deep loop at some point.

With regards to equivalent() I also make heavy use of this. Are you
suggesting rather than call equivalent, I should create the new object
within the parent rule?

i.e. rather than:

for (c in t1.cell){
t2.cells.add(c.equivalent());
}

DO:
for (c in t1.cell){
var x : Cell;
x.text := c.text;
t2.cells.add(x);
}

Thanks Darren
Re: ETL Performance? [message #580064 is a reply to message #483604] Wed, 02 September 2009 10:05 Go to previous message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Darren Clowes wrote:
> Thanks Dimitris I shall probably give those a try as I know I have a
> native object in a deep loop at some point.

Creating this once in the pre{} instead should speed things up.

>
> With regards to equivalent() I also make heavy use of this. Are you
> suggesting rather than call equivalent, I should create the new object
> within the parent rule?
>
> i.e. rather than:
>
> for (c in t1.cell){
> t2.cells.add(c.equivalent());
> }
> DO:
> for (c in t1.cell){
> var x : Cell;
> x.text := c.text;
> t2.cells.add(x);
> }

Yes. For such simple cases it's probably an overkill to define a
separate rule.

> Thanks Darren
>

Cheers,
Dimitris

--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Re: ETL Performance? [message #580081 is a reply to message #483605] Wed, 02 September 2009 11:31 Go to previous message
Darren  is currently offline Darren Friend
Messages: 40
Registered: September 2009
Member
Thanks Dimitris,

Your suggestion have improved performance significantly. I'm working on a
slower machine at the moment but what took 6hours now takes approx 30mins
on this slower machine. It still looks like it will take a few hours to
complete but that is much better than a few days :D

Darren
Re: ETL Performance? [message #580098 is a reply to message #483622] Wed, 02 September 2009 11:35 Go to previous message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

Glad this helped! I've generated a 60MB model and I'm profiling the ETL
engine to see if there are bits we can also improve internally.

Cheers,
Dimitris

Darren Clowes wrote:
> Thanks Dimitris,
>
> Your suggestion have improved performance significantly. I'm working on
> a slower machine at the moment but what took 6hours now takes approx
> 30mins on this slower machine. It still looks like it will take a few
> hours to complete but that is much better than a few days :D
>
> Darren
>


--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Re: ETL Performance? [message #580115 is a reply to message #483623] Wed, 02 September 2009 13:27 Go to previous message
Dimitrios Kolovos is currently offline Dimitrios KolovosFriend
Messages: 1776
Registered: July 2009
Senior Member
Hi Darren,

On an update to this, equivalent resolution seems to be the main
bottleneck indeed. Rewriting a sample ETL transformation into plain
procedural EOL seems to be reducing its execution time by almost 90%.
For a sample 2MB input model the ETL transformation takes 191 sec while
the EOL one only 18. The same EOL transformation transforms a 60MB input
model in under 7 mins.

There seems to be quite a bit of space for improvement in the equivalent
resolution of ETL. I've opened
https://bugs.eclipse.org/bugs/show_bug.cgi?id=288355 to keep track of this.

Cheers,
Dimitris

Dimitris Kolovos wrote:
> Hi Darren,
>
> Glad this helped! I've generated a 60MB model and I'm profiling the ETL
> engine to see if there are bits we can also improve internally.
>
> Cheers,
> Dimitris
>
> Darren Clowes wrote:
>> Thanks Dimitris,
>>
>> Your suggestion have improved performance significantly. I'm working
>> on a slower machine at the moment but what took 6hours now takes
>> approx 30mins on this slower machine. It still looks like it will take
>> a few hours to complete but that is much better than a few days :D
>>
>> Darren
>>
>
>


--
Spread the word: http://www.eclipse.org/gmt/epsilon/spreadtheword
Follow Epsilon on Twitter: http://twitter.com/epsilonews
Previous Topic:ETL Performance?
Next Topic:Minor compilation issue in org.eclipse.epsilon.test
Goto Forum:
  


Current Time: Fri Apr 19 19:42:32 GMT 2024

Powered by FUDForum. Page generated in 0.03342 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top