Eclipse Community Forums: Test and Performance Tools Platform (TPTP)

Help

Home

Home » Archived » Test and Performance Tools Platform (TPTP) » Binary trace support?

Show: Today's Messages :: Show Polls :: Message Navigator

Binary trace support? [message #26571]

Thu, 18 August 2005 03:12

Eclipse User

Originally posted by: askoliver.hotmail.com

Hi,

When I profile a large data-set application using TPTP, it seems it does not
scale very well. I think one defect is that piAgent spit out the XML format
for presenting the tracing data. The XML file is too big. I find there is a
binary_print.c in java profiler 'piAgent' project, and try to rebuild it for
binary print functionalities. But this component does not support so far? It
lacks of the header "binarytrace.h" and "binary_privates.h". Where I can
find these header files? I'm wondering what's the plan for binary print
support in TPTP? Does the agent controller can understand the binary format
and translate to the XML format now? Expect to see your response, thanks
much.

Best regards,
Oliver.

Report message to a moderator

Re: Binary trace support? [message #26625 is a reply to message #26571]

Thu, 18 August 2005 22:18

Eugene Chan

Messages: 287
Registered: July 2009

Senior Member

Hi. Oliver,

I am reposting the discussion thread from the mailing list . Hendra Suwanda
from runtime will contact you for further detail discussion on this topic.

Eugene

------------------------------------------------------------ ----------------
------

Sent: Thursday, August 18, 2005 12:58 PM

Subject: Re: [tptp-tracing-profiling-tools-dev] binary trace support?

Harm correctly contrasts trace-based data collection with sampling data

collection, but I don't want to leave out the third option: aggregating

data collection. This means watching the whole program as it runs, just

like trace does, but collecting and reporting only summary data (number

of

calls to each function, total and average time spent in each) instead of

streaming an entire execution trace to the workbench. This permits low

overhead / high performance while still populating the summary views

from

a single run of your program, not from occasional samples that might

miss

important details.

Our experience is that this kind of data collection scales very well.

The

data volume is proportional to the size of your program, not to the

length

of time you run it. And the volume is much, much smaller than a trace -

and thus time spent to format, transmit, and read it is smaller too. In

discussions of potential data collection systems that could scale better

than the current trace-based data collection, I don't want to leave this

one out. Thanks.

-- Allan Pratt, apratt@us.ibm.com

Rational software division of IBM

------------------------------------------------------------ ----------------
------

Sent by: 08/18/2005

Before we might be able to do the binary format we will probably try to
reduce the amount of produced data using a simple XML compressed format (at
least for the Profiling agent), like the one described in
https://bugs.eclipse.org/bugs/show_bug.cgi?id=97886. I believe this approach
will have a lower impact on the existing infrastructure.

Improving the raw trace size is just the beginning of these optimizations,
we will need to improve the EMF trace model instance size and also add
paging mechanism to the trace views (we already have filtering which can be
used to reduce the trace UI model instance), any contributions in this area
would be appreciated.

In the Log model case (that uses CommonBaseEvents) we started to put in
place a database backed resource with paging support and also the log
related views are paged (besides of filtering support which is already
available).

Thanks !

Marius Slavescu

Test and Analysis Tool Enablement, Rational Automated Software Quality

phone:905-413-3610 mailto:slavescu@ca.ibm.com

fax: 905-413-4920

IBM Canada Limited.

8200 Warden Ave.

Markham, Ontario L6G 1C7

------------------------------------------------------------ ----------------
------

Harm Sluiman <sluiman@ca.ibm.com>

Sent by: 08/18/2005 05:14 AM

There are a couple of reasons that you can run into data volume problems

when profiling. XML is actually not really the biggest issue. We have

studied this and have the intent of eventually providing binary support

in

TPTP but we have not have enough resource to develop that feature yet.

The

bigger issues are the fact that the piAgent is geared to do tracing vs

sampling styled profiling. This was intentional as the profiler in

general

was initially targeted at that problem space. Remote and distributed

tracing was a rather unique feature of this project when it started.

Tracing by it's nature captures all the data in a single pass through

the

code which requires careful filter settings. Sampling requires paths to

be

executed several times to get the data needed to identify a hot spot. As

with binary support we simply have not had the resource to apply to

adding

sampling based profiling although it is often requested.

Now with the advent of JVMTI we are assessing what the best approaches

are, given the resources we have.

As always we welcome any help or contribution people want to give to get

some of these features into the code base.

Thanks for your time.

------------------------------------------------------------ ------------

--

Harm Sluiman, STSM,

phone:905-413-4032 fax: 4920

cell: 416-432-9754

mailto:sluiman@ca.ibm.com

Admin : Arlene Treanor atreanor@ca.ibm.com Tie: 969-2323 1-905-413-2323

"Oliver" <askoliver@hotmail.com> wrote in message
news:de0uah$srh$1@news.eclipse.org...
> Hi,
>
> When I profile a large data-set application using TPTP, it seems it does
not
> scale very well. I think one defect is that piAgent spit out the XML
format
> for presenting the tracing data. The XML file is too big. I find there is
a
> binary_print.c in java profiler 'piAgent' project, and try to rebuild it
for
> binary print functionalities. But this component does not support so far?
It
> lacks of the header "binarytrace.h" and "binary_privates.h". Where I can
> find these header files? I'm wondering what's the plan for binary print
> support in TPTP? Does the agent controller can understand the binary
format
> and translate to the XML format now? Expect to see your response, thanks
> much.
>
>
> Best regards,
> Oliver.
>
>

Report message to a moderator

Overhead of complete execution history can make a profile useless (Re: Binary trace support?) [message #27953 is a reply to message #26625]

Wed, 24 August 2005 09:59

Oliver Schoett

Messages: 9
Registered: July 2009

Junior Member

Eugene Chan wrote:

>I am reposting the discussion thread from the mailing list .
>
Thanks for reporting this - I was not aware of this discussion.

Alan Pratt writes:

> Our experience is that this kind of data collection scales very well. The
> data volume is proportional to the size of your program, not to the
> length
> of time you run it. And the volume is much, much smaller than a trace -
> and thus time spent to format, transmit, and read it is smaller too. In
> discussions of potential data collection systems that could scale better
> than the current trace-based data collection,

More strongly, I would argue that the current TPTP profiling method that
collects complete execution history is useless for the following
reason: Collecting data about each method call in a new data record
each time increases the runtime of the profiled application by at least
a factor of 5. This means that when your application runs under the
profiler, over 80% of the running time is not application runtime, but
profiling overhead. This means that your profiling data essentially
profiles the profiler itself and not the application to be profiled.

The profiling overhead is essentially proportional to the number of
profiled method calls. This is not at all proportional to the true
running time of the methods; especially when the methods cause external
communication like database access.

From my experience (over two years profiling and optimizing
applications in C++ and Java), I would say that the profiling overhead
must never exceed 100% to obtain a meaningful profile, i. e. the runtime
under the profiler must not be more than 2x the runtime without
profiler. Overhead of 10% to 50% is even better, as then the actual
running time dominates the profiler overhead. Collecting aggreate
statistics helps a lot, because per profiled method call only a few
system calls plus a few arithmetic operations are needed, rather than
the generation and storage of a new data record for the complete
execution history (my guess is that this can reduce the profiling
overhead by a factor of 10 to 100, thus making it much easier to reach
the goal of low profiling overhead compared to the normal running time
of the application).

I have found the Hyades profiler 3.0.0 useless for this reason, and
currently work with the Eclipse Profiler
(http://eclipsecolorer.sourceforge.net/index_profiler.html), which does
exactly the aggregate data collection that Pratt describes, which works
very well for me too.

Note that there are several bug reports related to the extraordinary
memory overhead of complete execution history, like bugs 56645, 75266,
and 88917, and the following trivial example numbers make it clear why
approaches that improve only linearly, like improved storage format or
database storage of execution history will not solve the problem: When
method A is called 1000 times and calls method B 1000 times each, the
complete execution history will contain over a million call records (and
these numbers are still small!). The aggregate statistics only show 2
records, namely

Method A: 1000 calls, total real time: ... total CPU time: ...
Method B: 1000000 calls, total real time: ... total CPU time: ...

There is no way the million-fold extra data storage overhead of the
complete execution history can be compensated for by improved data
storage technology; the only viable solution to reduce both storage and
runtime overhead of profiling is to collect aggregate statistics.

At the moment, I can only recommend the Eclipse Profiler for serious
performance work, although it seems to be unmaintained and can cause
deadlocks on application server startup occasionally.

Regards,

Oliver Schoett

Report message to a moderator

Re: Overhead of complete execution history can make a profile useless (Re: Binary trace support?) [message #30326 is a reply to message #27953]

Wed, 07 September 2005 16:37

Marius Slavescu

Messages: 67
Registered: July 2009

Member

Oliver,

Thank you for your feedback and comments.

The following features will try to improve the overall performance of TPTP
trace model/agents and communication layer:

108938 Improve the performance of the trace model
108948 Support for data collection buffered mode (store-and-forward)
108950 Support for new profiling modes
108942 Support for enhanced profiling modes
108646 Create an "Aggregate Result" profiling set

Please add comments there as we will go soon through the approval process
for 4.2 and I think it would be great to have (at lease some of) them
committed.

I think that most detailed mode is still valuable for cases when appropriate
filters are set, and aside of the performance (which I hope would be much
improved in 4.2) it might be the only way to pinpoint the culprit in some
cases.

Regards,
Marius

"Oliver Schoett" <os@sdm.de> wrote in message
news:430C44F9.6000306@sdm.de...
> Eugene Chan wrote:
>
> >I am reposting the discussion thread from the mailing list .
> >
> Thanks for reporting this - I was not aware of this discussion.
>
> Alan Pratt writes:
>
> > Our experience is that this kind of data collection scales very well.
The
> > data volume is proportional to the size of your program, not to the
> > length
> > of time you run it. And the volume is much, much smaller than a trace -
> > and thus time spent to format, transmit, and read it is smaller too. In
> > discussions of potential data collection systems that could scale better
> > than the current trace-based data collection,
>
> More strongly, I would argue that the current TPTP profiling method that
> collects complete execution history is useless for the following
> reason: Collecting data about each method call in a new data record
> each time increases the runtime of the profiled application by at least
> a factor of 5. This means that when your application runs under the
> profiler, over 80% of the running time is not application runtime, but
> profiling overhead. This means that your profiling data essentially
> profiles the profiler itself and not the application to be profiled.
>
> The profiling overhead is essentially proportional to the number of
> profiled method calls. This is not at all proportional to the true
> running time of the methods; especially when the methods cause external
> communication like database access.
>
> From my experience (over two years profiling and optimizing
> applications in C++ and Java), I would say that the profiling overhead
> must never exceed 100% to obtain a meaningful profile, i. e. the runtime
> under the profiler must not be more than 2x the runtime without
> profiler. Overhead of 10% to 50% is even better, as then the actual
> running time dominates the profiler overhead. Collecting aggreate
> statistics helps a lot, because per profiled method call only a few
> system calls plus a few arithmetic operations are needed, rather than
> the generation and storage of a new data record for the complete
> execution history (my guess is that this can reduce the profiling
> overhead by a factor of 10 to 100, thus making it much easier to reach
> the goal of low profiling overhead compared to the normal running time
> of the application).
>
> I have found the Hyades profiler 3.0.0 useless for this reason, and
> currently work with the Eclipse Profiler
> (http://eclipsecolorer.sourceforge.net/index_profiler.html), which does
> exactly the aggregate data collection that Pratt describes, which works
> very well for me too.
>
> Note that there are several bug reports related to the extraordinary
> memory overhead of complete execution history, like bugs 56645, 75266,
> and 88917, and the following trivial example numbers make it clear why
> approaches that improve only linearly, like improved storage format or
> database storage of execution history will not solve the problem: When
> method A is called 1000 times and calls method B 1000 times each, the
> complete execution history will contain over a million call records (and
> these numbers are still small!). The aggregate statistics only show 2
> records, namely
>
> Method A: 1000 calls, total real time: ... total CPU time: ...
> Method B: 1000000 calls, total real time: ... total CPU time: ...
>
> There is no way the million-fold extra data storage overhead of the
> complete execution history can be compensated for by improved data
> storage technology; the only viable solution to reduce both storage and
> runtime overhead of profiling is to collect aggregate statistics.
>
> At the moment, I can only recommend the Eclipse Profiler for serious
> performance work, although it seems to be unmaintained and can cause
> deadlocks on application server startup occasionally.
>
> Regards,
>
> Oliver Schoett

Report message to a moderator

Previous Topic:	recording via proxy
Next Topic:	Scheduling of tests

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Apr 23 09:18:05 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter