Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » BIRT » Performance questions
Performance questions [message #537638] Thu, 03 June 2010 04:32 Go to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
[Context]
I'm a BIRT rookie faced with the task of improving performance in BIRT
setup created by someone else (also not a BIRT expert). We use BIRT 2.5.2.
My predecessor split the entire document/report into several sections
which are rendered sequentially i.e. several IRunAndRenderTask instances
run sequentially. I heard this measure was implemented in order to keep
memory consumption lower.


[First observation]
Since the implementation of the ResourceLocator is under our control I
started logging all invocations of the findResource() &
getReportDesign() methods.
Shockingly, the findResource() method is called dozens of times for one
and the same resource! I observed this for all integrated images,
resource bundle properties, javascript/JARs referenced in the .rptdesign
files.

-> does BIRT not cache those static resources?


[First debugging session]
Stepping through the BIRT code I ended up at EmitterUtil#getImage() at
one point and noticed that images are loaded byte by byte:

....
ArrayList<Byte> bytes = new ArrayList<Byte>( );
int data = in.read( );
while ( data != -1 ) {
bytes.add( (byte) data );
data = in.read( );
....

Needless to say that this is everything but efficient - especially if
the same image is loaded dozens of times (see above). Is there a
sensible explanation for this?


[How well does BIRT handle parallelization]
Since the several IRunAndRenderTask instances we have run sequentially I
was wondering how well BIRT would cope if I simply had them run in
parallel exploiting the benefits of multi-core/multi-processor
infrastructures.


Kind regards,
Marcel

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #537751 is a reply to message #537638] Thu, 03 June 2010 10:31 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Marcel,

Running them in parallel should work fine. Just create a new task for
each run. Do not recreate the engine. Can you open a bugzilla entry on
the findResource question so we can get it tracked?

Thanks

Jason

On 6/3/2010 4:32 AM, Marcel Stör wrote:
> [Context]
> I'm a BIRT rookie faced with the task of improving performance in BIRT
> setup created by someone else (also not a BIRT expert). We use BIRT 2.5.2.
> My predecessor split the entire document/report into several sections
> which are rendered sequentially i.e. several IRunAndRenderTask instances
> run sequentially. I heard this measure was implemented in order to keep
> memory consumption lower.
>
>
> [First observation]
> Since the implementation of the ResourceLocator is under our control I
> started logging all invocations of the findResource() &
> getReportDesign() methods.
> Shockingly, the findResource() method is called dozens of times for one
> and the same resource! I observed this for all integrated images,
> resource bundle properties, javascript/JARs referenced in the .rptdesign
> files.
>
> -> does BIRT not cache those static resources?
>
>
> [First debugging session]
> Stepping through the BIRT code I ended up at EmitterUtil#getImage() at
> one point and noticed that images are loaded byte by byte:
>
> ...
> ArrayList<Byte> bytes = new ArrayList<Byte>( );
> int data = in.read( );
> while ( data != -1 ) {
> bytes.add( (byte) data );
> data = in.read( );
> ...
>
> Needless to say that this is everything but efficient - especially if
> the same image is loaded dozens of times (see above). Is there a
> sensible explanation for this?
>
>
> [How well does BIRT handle parallelization]
> Since the several IRunAndRenderTask instances we have run sequentially I
> was wondering how well BIRT would cope if I simply had them run in
> parallel exploiting the benefits of multi-core/multi-processor
> infrastructures.
>
>
> Kind regards,
> Marcel
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538307 is a reply to message #537751] Mon, 07 June 2010 05:47 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 03.06.2010 16:31, Jason Weathersby wrote:
> Marcel,
>
> Running them in parallel should work fine. Just create a new task for
> each run. Do not recreate the engine.

Works quite well on multi-CPU machines. On slower machines a single
IRunAndRenderTask instance already consumes 100% CPU so I had to look
further.

In our code we operate on an org.jdom.Document. However, we pass it to
BIRT as a ByteArrayInputStream like so:

task.getAppContext().put("org.eclipse.datatools.enablement.oda.xml.inputStream ",
xmlData);

Going from a XML document to byte[] seems super-dumb as BIRT will have
to do the exact opposite. Profiling showed that we loose *a lot of time*
in org.eclipse.datatools.enablement.oda.xml.util.SaxParser.run( ) - no
wonder.

Is there a way I could pass a Document (org.jdom or org.w3c.dom) to BIRT
and thus avoid the extra parsing?


> Can you open a bugzilla entry on
> the findResource question so we can get it tracked?

Yep, I'll do that.


Regards,
Marcel

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #538423 is a reply to message #538307] Mon, 07 June 2010 10:50 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

This would be a great enhancement request. Can you log it?

Jason

On 6/7/2010 5:47 AM, Marcel Stör wrote:
> On 03.06.2010 16:31, Jason Weathersby wrote:
>> Marcel,
>>
>> Running them in parallel should work fine. Just create a new task for
>> each run. Do not recreate the engine.
>
> Works quite well on multi-CPU machines. On slower machines a single
> IRunAndRenderTask instance already consumes 100% CPU so I had to look
> further.
>
> In our code we operate on an org.jdom.Document. However, we pass it to
> BIRT as a ByteArrayInputStream like so:
>
> task.getAppContext().put("org.eclipse.datatools.enablement.oda.xml.inputStream ",
> xmlData);
>
> Going from a XML document to byte[] seems super-dumb as BIRT will have
> to do the exact opposite. Profiling showed that we loose *a lot of time*
> in org.eclipse.datatools.enablement.oda.xml.util.SaxParser.run( ) - no
> wonder.
>
> Is there a way I could pass a Document (org.jdom or org.w3c.dom) to BIRT
> and thus avoid the extra parsing?
>
>
>> Can you open a bugzilla entry on
>> the findResource question so we can get it tracked?
>
> Yep, I'll do that.
>
>
> Regards,
> Marcel
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538427 is a reply to message #538423] Mon, 07 June 2010 11:10 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 07.06.2010 16:50, Jason Weathersby wrote:
> This would be a great enhancement request. Can you log it?

I sure could. However, I'm a little irritated by your reply. Does that
mean there's nothing I could do about improving performance in my scenario?

> On 6/7/2010 5:47 AM, Marcel Stör wrote:
>> On 03.06.2010 16:31, Jason Weathersby wrote:
>>> Marcel,
>>>
>>> Running them in parallel should work fine. Just create a new task for
>>> each run. Do not recreate the engine.
>>
>> Works quite well on multi-CPU machines. On slower machines a single
>> IRunAndRenderTask instance already consumes 100% CPU so I had to look
>> further.
>>
>> In our code we operate on an org.jdom.Document. However, we pass it to
>> BIRT as a ByteArrayInputStream like so:
>>
>> task.getAppContext().put("org.eclipse.datatools.enablement.oda.xml.inputStream ",
>>
>> xmlData);
>>
>> Going from a XML document to byte[] seems super-dumb as BIRT will have
>> to do the exact opposite. Profiling showed that we loose *a lot of time*
>> in org.eclipse.datatools.enablement.oda.xml.util.SaxParser.run( ) - no
>> wonder.
>>
>> Is there a way I could pass a Document (org.jdom or org.w3c.dom) to BIRT
>> and thus avoid the extra parsing?
>>
>>
>>> Can you open a bugzilla entry on
>>> the findResource question so we can get it tracked?
>>
>> Yep, I'll do that.
>>
>>
>> Regards,
>> Marcel
>>
>


--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #538439 is a reply to message #538427] Mon, 07 June 2010 11:59 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Marcel,

I did not mean to irritate you with my last reply. I do not know of a
work around in this particular case unless you want to modify the source
for the org.eclipse.datatools.enablement.oda.xml plugin.

Jason

On 6/7/2010 11:10 AM, Marcel Stör wrote:
> On 07.06.2010 16:50, Jason Weathersby wrote:
>> This would be a great enhancement request. Can you log it?
>
> I sure could. However, I'm a little irritated by your reply. Does that
> mean there's nothing I could do about improving performance in my scenario?
>
>> On 6/7/2010 5:47 AM, Marcel Stör wrote:
>>> On 03.06.2010 16:31, Jason Weathersby wrote:
>>>> Marcel,
>>>>
>>>> Running them in parallel should work fine. Just create a new task for
>>>> each run. Do not recreate the engine.
>>>
>>> Works quite well on multi-CPU machines. On slower machines a single
>>> IRunAndRenderTask instance already consumes 100% CPU so I had to look
>>> further.
>>>
>>> In our code we operate on an org.jdom.Document. However, we pass it to
>>> BIRT as a ByteArrayInputStream like so:
>>>
>>> task.getAppContext().put("org.eclipse.datatools.enablement.oda.xml.inputStream ",
>>>
>>>
>>> xmlData);
>>>
>>> Going from a XML document to byte[] seems super-dumb as BIRT will have
>>> to do the exact opposite. Profiling showed that we loose *a lot of time*
>>> in org.eclipse.datatools.enablement.oda.xml.util.SaxParser.run( ) - no
>>> wonder.
>>>
>>> Is there a way I could pass a Document (org.jdom or org.w3c.dom) to BIRT
>>> and thus avoid the extra parsing?
>>>
>>>
>>>> Can you open a bugzilla entry on
>>>> the findResource question so we can get it tracked?
>>>
>>> Yep, I'll do that.
>>>
>>>
>>> Regards,
>>> Marcel
>>>
>>
>
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538450 is a reply to message #538439] Mon, 07 June 2010 12:10 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

The only other option I could think of was to use a scripted datasource
instead of the xml datasource and handle the parsing in scripted
datasource events.

Jason

On 6/7/2010 11:59 AM, Jason Weathersby wrote:
> Marcel,
>
> I did not mean to irritate you with my last reply. I do not know of a
> work around in this particular case unless you want to modify the source
> for the org.eclipse.datatools.enablement.oda.xml plugin.
>
> Jason
>
> On 6/7/2010 11:10 AM, Marcel Stör wrote:
>> On 07.06.2010 16:50, Jason Weathersby wrote:
>>> This would be a great enhancement request. Can you log it?
>>
>> I sure could. However, I'm a little irritated by your reply. Does that
>> mean there's nothing I could do about improving performance in my
>> scenario?
>>
>>> On 6/7/2010 5:47 AM, Marcel Stör wrote:
>>>> On 03.06.2010 16:31, Jason Weathersby wrote:
>>>>> Marcel,
>>>>>
>>>>> Running them in parallel should work fine. Just create a new task for
>>>>> each run. Do not recreate the engine.
>>>>
>>>> Works quite well on multi-CPU machines. On slower machines a single
>>>> IRunAndRenderTask instance already consumes 100% CPU so I had to look
>>>> further.
>>>>
>>>> In our code we operate on an org.jdom.Document. However, we pass it to
>>>> BIRT as a ByteArrayInputStream like so:
>>>>
>>>> task.getAppContext().put("org.eclipse.datatools.enablement.oda.xml.inputStream ",
>>>>
>>>>
>>>>
>>>> xmlData);
>>>>
>>>> Going from a XML document to byte[] seems super-dumb as BIRT will have
>>>> to do the exact opposite. Profiling showed that we loose *a lot of
>>>> time*
>>>> in org.eclipse.datatools.enablement.oda.xml.util.SaxParser.run( ) - no
>>>> wonder.
>>>>
>>>> Is there a way I could pass a Document (org.jdom or org.w3c.dom) to
>>>> BIRT
>>>> and thus avoid the extra parsing?
>>>>
>>>>
>>>>> Can you open a bugzilla entry on
>>>>> the findResource question so we can get it tracked?
>>>>
>>>> Yep, I'll do that.
>>>>
>>>>
>>>> Regards,
>>>> Marcel
>>>>
>>>
>>
>>
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538479 is a reply to message #538450] Mon, 07 June 2010 13:06 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 07.06.10 18:10, Jason Weathersby wrote:
> The only other option I could think of was to use a scripted datasource
> instead of the xml datasource and handle the parsing in scripted
> datasource events.

Oookkk, as I'm a BIRT rookie this doesn't ring a bell. I'll have to
Google for that. If you happen to know a particularly good tutorial I'd
appreciate it if you posted the link(s).

Also,
http://www.docstoc.com/docs/23236254/Designing-High-Performa nce-BIRT-Reports
mentions that the use of JAXB could significantly improve performance.
However, Google wasn't able to come up with information as for how to go
about this. Do you?

My (naive?) expectation/assumption would be that BIRT builds "some kind"
of internal XML model and issues XPath statements against that model.
It doesn't seem to be like that. Otherwise, the ByteArrayInputStream
passed to BIRT would only have to parsed *once* for any given
IRunAndRenderTask. I don't fully understand what's going on in the
ODA/XML corner (I spent a few hours debugging) but if it were like that
we wouldn't loose more than 50% of the entire processing time with the
SAX parser.

Cheers,
Marcel

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
Skype: marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #538537 is a reply to message #538479] Mon, 07 June 2010 17:49 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Marcel

I will talk with the DTP team on this issue. I did a quick test and the
parse method is only called once for my table when tied to the xml
dataset. Does the report have nested tables?

Jason

On 6/7/2010 1:06 PM, Marcel Stör wrote:
> On 07.06.10 18:10, Jason Weathersby wrote:
>> The only other option I could think of was to use a scripted datasource
>> instead of the xml datasource and handle the parsing in scripted
>> datasource events.
>
> Oookkk, as I'm a BIRT rookie this doesn't ring a bell. I'll have to
> Google for that. If you happen to know a particularly good tutorial I'd
> appreciate it if you posted the link(s).
>
> Also,
> http://www.docstoc.com/docs/23236254/Designing-High-Performa nce-BIRT-Reports
> mentions that the use of JAXB could significantly improve performance.
> However, Google wasn't able to come up with information as for how to go
> about this. Do you?
>
> My (naive?) expectation/assumption would be that BIRT builds "some kind"
> of internal XML model and issues XPath statements against that model.
> It doesn't seem to be like that. Otherwise, the ByteArrayInputStream
> passed to BIRT would only have to parsed *once* for any given
> IRunAndRenderTask. I don't fully understand what's going on in the
> ODA/XML corner (I spent a few hours debugging) but if it were like that
> we wouldn't loose more than 50% of the entire processing time with the
> SAX parser.
>
> Cheers,
> Marcel
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538576 is a reply to message #538537] Mon, 07 June 2010 23:30 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Marcel,

I did find out from the dev team that the parse can be called multiple
times when using things like recursive column mapping.

Jason

On 6/7/2010 5:49 PM, Jason Weathersby wrote:
> Marcel
>
> I will talk with the DTP team on this issue. I did a quick test and the
> parse method is only called once for my table when tied to the xml
> dataset. Does the report have nested tables?
>
> Jason
>
> On 6/7/2010 1:06 PM, Marcel Stör wrote:
>> On 07.06.10 18:10, Jason Weathersby wrote:
>>> The only other option I could think of was to use a scripted datasource
>>> instead of the xml datasource and handle the parsing in scripted
>>> datasource events.
>>
>> Oookkk, as I'm a BIRT rookie this doesn't ring a bell. I'll have to
>> Google for that. If you happen to know a particularly good tutorial I'd
>> appreciate it if you posted the link(s).
>>
>> Also,
>> http://www.docstoc.com/docs/23236254/Designing-High-Performa nce-BIRT-Reports
>>
>> mentions that the use of JAXB could significantly improve performance.
>> However, Google wasn't able to come up with information as for how to go
>> about this. Do you?
>>
>> My (naive?) expectation/assumption would be that BIRT builds "some kind"
>> of internal XML model and issues XPath statements against that model.
>> It doesn't seem to be like that. Otherwise, the ByteArrayInputStream
>> passed to BIRT would only have to parsed *once* for any given
>> IRunAndRenderTask. I don't fully understand what's going on in the
>> ODA/XML corner (I spent a few hours debugging) but if it were like that
>> we wouldn't loose more than 50% of the entire processing time with the
>> SAX parser.
>>
>> Cheers,
>> Marcel
>>
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #538684 is a reply to message #537751] Tue, 08 June 2010 08:19 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 03.06.2010 16:31, Jason Weathersby wrote:
> Can you open a bugzilla entry on the findResource question so we can get
> it tracked?

https://bugs.eclipse.org/bugs/show_bug.cgi?id=316115

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #538685 is a reply to message #538423] Tue, 08 June 2010 08:19 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 07.06.2010 16:50, Jason Weathersby wrote:
> This would be a great enhancement request. Can you log it?

https://bugs.eclipse.org/bugs/show_bug.cgi?id=316118

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #538724 is a reply to message #538685] Tue, 08 June 2010 09:40 Go to previous messageGo to next message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Thanks for logging this.

Jason

On 6/8/2010 8:19 AM, Marcel Stör wrote:
> On 07.06.2010 16:50, Jason Weathersby wrote:
>> This would be a great enhancement request. Can you log it?
>
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=316118
>


Jason Weathersby

BIRT Exchange
Re: Performance questions [message #545359 is a reply to message #537751] Wed, 07 July 2010 10:04 Go to previous messageGo to next message
Marcel Stör is currently offline Marcel Stör
Messages: 72
Registered: July 2009
Member
On 03.06.2010 16:31, Jason Weathersby wrote:
> Running them in parallel should work fine. Just create a new task for
> each run. Do not recreate the engine.

Turns it doesn't work fine due to this bug
https://bugs.eclipse.org/bugs/show_bug.cgi?id=319088 For whatever
reason Base64PropertyState uses a *static* reference to Base64
(commons-codec)...

Cheers,
Marcel

--
Marcel Stör, http://www.frightanic.com
Couchsurfing: http://www.couchsurfing.com/people/marcelstoer
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
-> I kill Google Groups posts: http://improve-usenet.org
Re: Performance questions [message #545396 is a reply to message #545359] Wed, 07 July 2010 11:55 Go to previous message
Jason Weathersby is currently offline Jason Weathersby
Messages: 9167
Registered: July 2009
Senior Member

Marcel,

Thanks for logging the bug.

Jason

On 7/7/2010 10:04 AM, Marcel Stör wrote:
> On 03.06.2010 16:31, Jason Weathersby wrote:
>> Running them in parallel should work fine. Just create a new task for
>> each run. Do not recreate the engine.
>
> Turns it doesn't work fine due to this bug
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=319088 For whatever
> reason Base64PropertyState uses a *static* reference to Base64
> (commons-codec)...
>
> Cheers,
> Marcel
>


Jason Weathersby

BIRT Exchange
Previous Topic:Is there a way to preview a report in pdf but have it write to a file instead of using the viewer?
Next Topic:SAXParseException: Premature end of file.
Goto Forum:
  


Current Time: Wed May 22 19:32:24 EDT 2013

Powered by FUDForum. Page generated in 0.02416 seconds