Eclipse Community Forums: Test and Performance Tools Platform (TPTP) » Performance issues with and without Large Resource Support (using Cloudscape or Derby)

Home » Archived » Test and Performance Tools Platform (TPTP) » Performance issues with and without Large Resource Support (using Cloudscape or Derby)

Performance issues with and without Large Resource Support (using Cloudscape or Derby) [message #98355]

Wed, 09 May 2007 13:21

Eclipse User

Hi there,

I am developing an extension to Log & Trace Analyzer to import some
proprietary trace logs into the TPTP Log View (the table). I am doing
this by implementing a static GLA parser and plugging this into the
framework via the extension points and a .adapter file.

This all works fine, except that I would now like to be able to import
large log files, i.e files that are larger than a few tens of MB
(potentially there are use cases for log files or streams that are
several GB in size). I will describe what happens for me when either not
using a databse, or using a database, and then I'll give you some specs
that might be useful to analyze this problem:

1) If I don't use a database to store my log data, I can import log
files that are up to around 2-5 MB. Beyond that I get out-of-memory
exceptions, or sometimes Eclipse simply freezes, even if I configure
Eclipse to reserve 1024MB of heap space (via the VM args). I have done
some simple profiling of my parser, which seems to reveal that it uses
quite a bit of memory, but not nearly as much as I can see being used by
the VM in Windows Task Manager (I need to do some more work to be able
to make a stronger statement than this). So my first question is: Is it
normal that the LTA uses a lot of memory while importing, seemingly much
more than it should (when NOT using a database)? Or could this be a
problem with the implementation of my parser?

2) Today I have tried storing my log data in a database (I used Derby).
This was working, but took around 10 times longer than when I wasn't
using the databse, for both importing and operating on the table
(sorting, turning pages etc). My second question: Is it normal that I am
getting this performance hit when using a database, or could it be a
problem in the implementation of my parser?

Some specs that might be useful:

- The machine I'm using has these specs:
OS: Windows 2000
CPU: Intel 4 at 3.2GHz
RAM: 1GB
- The version of Eclipse / EMF / TPTP I am using is: 3.2.2 / 2.2.2 / 4.3.1
- The database I used: Derby 10.2.2.0
- The CBE XML that my parser outputs can be 100 times larger than the
original (binary) log data. I can see that this can cause problems,
however LTA (or my parser) still seems to use an unproportionate amount
of memory for even small files (a few KB).

I would be grateful for any hints on what might be happening here. I
will do some more tests and profiling and will post any updates on here.
Thanks in advance!

Regards,
Deniz

Re: Performance issues with and without Large Resource Support (using Cloudscape or Derby) [message #98484 is a reply to message #98355]

Fri, 11 May 2007 17:57

Eclipse User

Hi Deniz,

It seems like you've done some very useful analysis. I recommend opening a
high severity defect to outline the deficiencies in leveraging the log
analyzer. I believe there are some on going work for making the data models
in TPTP more scalable but I haven't been intimately involved in that.

Thanks,

Ali Mehregani

"Deniz

Re: Performance issues with and without Large Resource Support (using Cloudscape or Derby) [message #98514 is a reply to message #98484]

Mon, 14 May 2007 06:11

Eclipse User

Hi Ali,

Ok, I'll browser the bugzilla a little bit and will then create a
defect. Hope this will help.

Regards,
Deniz

Ali Mehregani wrote:
> Hi Deniz,
>
> It seems like you've done some very useful analysis. I recommend opening a
> high severity defect to outline the deficiencies in leveraging the log
> analyzer. I believe there are some on going work for making the data models
> in TPTP more scalable but I haven't been intimately involved in that.
>
> Thanks,
>
> Ali Mehregani
>
>
> "Deniz Özsen" <deniz.ozsen@symbian.com> wrote in message
> news:f1svun$5km$1@build.eclipse.org...
>> Hi there,
>>
>> I am developing an extension to Log & Trace Analyzer to import some
>> proprietary trace logs into the TPTP Log View (the table). I am doing this
>> by implementing a static GLA parser and plugging this into the framework
>> via the extension points and a .adapter file.
>>
>> This all works fine, except that I would now like to be able to import
>> large log files, i.e files that are larger than a few tens of MB
>> (potentially there are use cases for log files or streams that are several
>> GB in size). I will describe what happens for me when either not using a
>> databse, or using a database, and then I'll give you some specs that might
>> be useful to analyze this problem:
>>
>> 1) If I don't use a database to store my log data, I can import log files
>> that are up to around 2-5 MB. Beyond that I get out-of-memory exceptions,
>> or sometimes Eclipse simply freezes, even if I configure Eclipse to
>> reserve 1024MB of heap space (via the VM args). I have done some simple
>> profiling of my parser, which seems to reveal that it uses quite a bit of
>> memory, but not nearly as much as I can see being used by the VM in
>> Windows Task Manager (I need to do some more work to be able to make a
>> stronger statement than this). So my first question is: Is it normal that
>> the LTA uses a lot of memory while importing, seemingly much more than it
>> should (when NOT using a database)? Or could this be a problem with the
>> implementation of my parser?
>>
>> 2) Today I have tried storing my log data in a database (I used Derby).
>> This was working, but took around 10 times longer than when I wasn't using
>> the databse, for both importing and operating on the table (sorting,
>> turning pages etc). My second question: Is it normal that I am getting
>> this performance hit when using a database, or could it be a problem in
>> the implementation of my parser?
>>
>> Some specs that might be useful:
>>
>> - The machine I'm using has these specs:
>> OS: Windows 2000
>> CPU: Intel 4 at 3.2GHz
>> RAM: 1GB
>> - The version of Eclipse / EMF / TPTP I am using is: 3.2.2 / 2.2.2 / 4.3.1
>> - The database I used: Derby 10.2.2.0
>> - The CBE XML that my parser outputs can be 100 times larger than the
>> original (binary) log data. I can see that this can cause problems,
>> however LTA (or my parser) still seems to use an unproportionate amount of
>> memory for even small files (a few KB).
>>
>> I would be grateful for any hints on what might be happening here. I will
>> do some more tests and profiling and will post any updates on here. Thanks
>> in advance!
>>
>> Regards,
>> Deniz
>
>

Re: Performance issues with and without Large Resource Support (using Cloudscape or Derby) [message #98620 is a reply to message #98355]

Mon, 14 May 2007 11:54

Eclipse User

Hi Deniz,

Thanks for trying TPTP and bringing these questions up.

Here are some insights and explanations of what you've seen.

If you set 1024Mb heap size the processing might go slower because of
paging, try to look at the memory usage in Task Manager (make sure it is not
close or above the 1Gb physical limit on your machine).

Watch the heap using "Show heap status" from Eclipse "General" preference
page when running the import log in TPTP, try to start with a fresh
workspace.

During import in XMI (non large log) the used memory grows beyond of the log
data footprint (you'll probably see also some spikes), as there is extra
memory required for parsing the log and loading it in the EMF model. The
same happens when it is imported using Large Resource Support (RDB based)
but with less extra memory used (in local import scenario, which is the
recommended one anyway).

Although your raw log data set seems to be small, try to use GLA runtime to
convert it to Common Base Evens then look at the size of that output (it
could be even 10x or more larger then the original one), watch the execution
time and memory utilization of that operation also, that will give you a
hint if the problems are in your parser/GLA runtime. BTW do you have a
static of rule base parser?

I see that you mentioned 100x larger than raw log data, that seem a bit too
much, you need to look at the generated Common Base Event XML and look for
duplicated data.

Although last year we made great performance improvements to the import log
operation using Larger Resource Support (local import scenario), it would
still be significant slower then the import in XMI (memory) for small files
(those that still fit in memory).

We basically generate CSV files (that operation is pretty fast, could be
even faster than default XML event generation) and we load them using the
specific database import facilities (which should provide us the best load
database performance). If the system where the database is running (same as
the workbench in this case) is overloaded (memory or CPU wise) the CSV load
operation would be also slowdown significantly.

We still need to look further in optimizations of the database behavior
using database tuning methods, although those will probably not be that
generic (it might require data set specific optimizations), we could only
provide some hints and preconfigured indexes.

The paging and filtering/sorting (table sorting is done in UI, same
performance for both RDB and XMI resources) operations in RDB case are also
slower because we need to run queries and bring page the root objects and
their direct children (all lists now can be populated lazily) fully
populated (because of EMF, that's why we started to look into building a new
DMS API which will overcome most of the problems encountered using EMF on
top or an RDB resource), so 100 log view entries per page could mean 500 or
more objects loaded which could require several tenths/hundreds of SQL
queries.

You can imagine that a page load operation would be significantly slower
than having the whole resource contents in memory (as in XMI case) and just
do some quick traversals and filtering/sorting as we do when using the in
memory query engine (which basically runs the same model queries as the RDB
one).

There was a critical bug (I'm not sure if it the fix is in CVS or not) which
actually called 3 times load page for each page up/down action in the Log
View, so if a regular load page in RDB resource case would take 2 or less
seconds (depends on the speed of the database machine) it takes 6s.

Today in XMI case ( in memory model) in most case most of the time is spent
in the UI building the table entries (this is true for both Log and Trace
model) vs the model query execution (which is pretty fast).

Would be great if you can provide any performance data from your scenario,
as you further proceed with the investigation, and we may be able to help
you on the parser side also if you can make that public.

Regards,
Marius

Re: Performance issues with and without Large Resource Support (using Cloudscape or Derby) [message #98890 is a reply to message #98620]

Mon, 21 May 2007 07:33

Eclipse User

Hi Marius,

Thanks for your detailled explanations!

> BTW do you have a static of rule base parser?
I have implemented a static parser. This seemed like the more sensible
thing to do, as the log I am importing is in a complex binary format,
where lots of specific bits in a record's header have to be checked to
be able to correctly parse the record.

> I see that you mentioned 100x larger than raw log data, that seem a
> bit too much, you need to look at the generated Common Base Event
> XML and look for duplicated data.
Very true, I am currently reviewing the mapping from binary format to
CBE. This is quite tricky, because you have to strike a balance between
presenting all the data in an easily readable way (e.g.
ExtendedDataElements for each important chunk of information) and saving
space (e.g. put all of the data into one ExtendedDataElement, except for
the bits and pieces that make sense in CBE fields such as
SourceComponentThreadId). However, this could provide a good way to
improve the performance significantly.

Thanks a lot so far! I'll keep you updated. Would you still like me to
open a defect in bugzilla, as suggested by Ali?

Regards,
Deniz

Marius Slavescu wrote:
> Hi Deniz,
>
> Thanks for trying TPTP and bringing these questions up.
>
> Here are some insights and explanations of what you've seen.
>
> If you set 1024Mb heap size the processing might go slower because of
> paging, try to look at the memory usage in Task Manager (make sure it is not
> close or above the 1Gb physical limit on your machine).
>
> Watch the heap using "Show heap status" from Eclipse "General" preference
> page when running the import log in TPTP, try to start with a fresh
> workspace.
>
> During import in XMI (non large log) the used memory grows beyond of the log
> data footprint (you'll probably see also some spikes), as there is extra
> memory required for parsing the log and loading it in the EMF model. The
> same happens when it is imported using Large Resource Support (RDB based)
> but with less extra memory used (in local import scenario, which is the
> recommended one anyway).
>
> Although your raw log data set seems to be small, try to use GLA runtime to
> convert it to Common Base Evens then look at the size of that output (it
> could be even 10x or more larger then the original one), watch the execution
> time and memory utilization of that operation also, that will give you a
> hint if the problems are in your parser/GLA runtime. BTW do you have a
> static of rule base parser?
>
> I see that you mentioned 100x larger than raw log data, that seem a bit too
> much, you need to look at the generated Common Base Event XML and look for
> duplicated data.
>
> Although last year we made great performance improvements to the import log
> operation using Larger Resource Support (local import scenario), it would
> still be significant slower then the import in XMI (memory) for small files
> (those that still fit in memory).
>
> We basically generate CSV files (that operation is pretty fast, could be
> even faster than default XML event generation) and we load them using the
> specific database import facilities (which should provide us the best load
> database performance). If the system where the database is running (same as
> the workbench in this case) is overloaded (memory or CPU wise) the CSV load
> operation would be also slowdown significantly.
>
> We still need to look further in optimizations of the database behavior
> using database tuning methods, although those will probably not be that
> generic (it might require data set specific optimizations), we could only
> provide some hints and preconfigured indexes.
>
> The paging and filtering/sorting (table sorting is done in UI, same
> performance for both RDB and XMI resources) operations in RDB case are also
> slower because we need to run queries and bring page the root objects and
> their direct children (all lists now can be populated lazily) fully
> populated (because of EMF, that's why we started to look into building a new
> DMS API which will overcome most of the problems encountered using EMF on
> top or an RDB resource), so 100 log view entries per page could mean 500 or
> more objects loaded which could require several tenths/hundreds of SQL
> queries.
>
> You can imagine that a page load operation would be significantly slower
> than having the whole resource contents in memory (as in XMI case) and just
> do some quick traversals and filtering/sorting as we do when using the in
> memory query engine (which basically runs the same model queries as the RDB
> one).
>
> There was a critical bug (I'm not sure if it the fix is in CVS or not) which
> actually called 3 times load page for each page up/down action in the Log
> View, so if a regular load page in RDB resource case would take 2 or less
> seconds (depends on the speed of the database machine) it takes 6s.
>
> Today in XMI case ( in memory model) in most case most of the time is spent
> in the UI building the table entries (this is true for both Log and Trace
> model) vs the model query execution (which is pretty fast).
>
> Would be great if you can provide any performance data from your scenario,
> as you further proceed with the investigation, and we may be able to help
> you on the parser side also if you can make that public.
>
> Regards,
> Marius
>
>

Previous Topic:	question regarding datapools
Next Topic:	tutorial

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Jul 06 03:15:24 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter