Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [tracecompass-dev] Question about scaling of data driven analysis

Hi Robert,

As Loic said, large number of attributes in the state system cause scaling issues. The number of attributes per tree level does not matter, it's the total number.

Some questions/paths to try:

* Limiting the number of attributes in the tree would be preferable. Do you need all of them for your analysis? If not, you should stick with the bare minimum needed.

* Maybe you can split your data in multiple analyses if there are distinct subsets of data for different views

* As for grouping them, do many attribute values start and end at the same time? If so, they could be grouped in a custom state value for one attribute. That is a new state value type that was added recently, but it is not used yet and not supported in data-driven analyses. But it should be "fairly easy" to add that support. But that doesn't help you today. But knowing someone has an actual need for it may make it happen faster ;-)

* The thingidx, do they follow a pattern of start, something's happening, then end? If so, that is a killer of state system performance. You could consider using a segment store. See [1] for instructions on writing an XML pattern to create segments and [2] for the path on the source of TraceCompass of an example pattern using segments. If you do so, I have to warn you you may reach the limit of the segment stores (the available memory). Some prototypes of scalable segment stores are on gerrit right now, we just need to review them and accept one of them.

* If the attributes have a high frequency of change, using a type that takes less space may help also. Integers take less space than strings. If you have string attributes that could be an enum, then you should use an enum and save ints instead (see the <definedValue> tag of the XML)

* You can see the size on disk of the state system by selecting the analysis in the project explorer and looking in the Properties view under "Analysis module properties". The smaller, the better, so when you try something, it may give you an indication on the effect it had on the state system.

I hope some of this can help you. Also, Loic, who replied earlier, is doing is master's degree in improving the state system performances. If possible, you could provide him a trace, the parser and the analysis so he can see if the solutions he proposes improve a variety of cases or if there is more that should be done.

Regards,
Geneviève


[1] http://archive.eclipse.org/tracecompass/doc/org.eclipse.tracecompass.doc.user/Data-driven-analysis.html#Defining_an_XML_pattern_provider
[2] tmf/org.eclipse.tracecompass.analysis.xml.core.tests/test_xml_files/test_valid/test_valid_pattern.xml


On 06/22/2016 06:50 AM, Robbert Jongeling wrote:
Hi Loïc and Matthew,

Thank you for your responses.

I have indeed noticed that opening the trace without the analysis attached is much faster.
It takes about 50 seconds to index the 2.2 million events, however when the analysis is attached, it takes too long to wait for (it seems to index only hundreds of events per second compared to thousands or even tens of thousands when no analysis is attached).
After applying the patch you both suggested, creating the view based on my analysis still takes too long to wait for.
Indexing of the trace when the analysis is attached is also still a lot slower than when it is not attached.

(By attached I mean the following:
I currently have an analysis file in which I define a stateprovider and a timegraphview.
Indexing the trace without the timegraphview opened is normal, quick, but indexing the trace while the timegraphview is already opened takes very long. 
In both cases, the view is imported in Trace Compass.)

The patch does improve the performance.
I have profiled two runs of TraceCompass, one without the patch and one with the patch.
In these runs I have first opened the trace and after indexing opened my custom timegraphview. 
At that time, I have started the profiling.
This does show a significant improvement of time spent on historytree related functions, 
but not on the HTNode.readNode method, which takes up the majority of the time in both cases.
Which, I think, is caused by the large amount of attributes on a level in the state system.

When I run my analysis on a different version of the trace, in which the top level attribute of the state system contains not thousands of attributes but just one, generation of the view is almost instantanious.
Is it possible to somehow group the attributes in order to make the generation faster?


Kind regards,

Robbert


Van: tracecompass-dev-bounces@xxxxxxxxxxx <tracecompass-dev-bounces@xxxxxxxxxxx> namens Matthew Khouzam <matthew.khouzam@xxxxxxxxxxxx>
Verzonden: dinsdag 21 juni 2016 21:15:08
Aan: tracecompass-dev@xxxxxxxxxxx
Onderwerp: Re: [tracecompass-dev] Question about scaling of data driven analysis
 
Hi Robbert!

Quick question, can you load the 200mb trace easily if there is no
analysis? (remove the analysis from tc)

if that is the case, the problem lies in the state system, perhaps this
patch can fix it. https://git.eclipse.org/r/#/c/61062/

Please keep us posted,

Br,

Matthew

On 16-06-21 09:06 AM, Robbert Jongeling wrote:
> Dear Trace Compass developers,
>
> I am trying to apply Trace Compass to custom, text-based traces and to
> this end have created a parser and an xml-analysis.
> My question relates to the scaling of the application or the analysis
> to larger traces.
> In case of a large trace (200MB) my analysis takes about 2 hours to
> complete, whereas for small traces (10MB) it takes seconds.
> What can I do to increase the performance of my xml analysis?
>
> I have noticed that the issue may be caused because of the size of my
> state system structure, which is about 4 levels deep but on each level
> there may be >1k different entries. E.g.:
>
> |- trace
> || - thingid1
> || - subthingid 1
> || - somevalue
> |...
> || - thigid1000
> || - subthingid 1
> |...
> || - subthingid 1000
> || - someOtherValue
>
>
> I have noticed that the getRelevantInterval method in
> org.eclipse.tracecompass.internal.statesystem.core.backend.historytree.HTNode
> takes up quite a lot of time.
> Also, I found this bug
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=492525, which discusses
> that CTF traces can be packetized.
> Is there a similar way to handle custom (text-based) traces?
>
> I would greatly appreciate any pointers you can give me.
>
>
> Kind regards,
>
> Robbert
>
>
>
> _______________________________________________
> tracecompass-dev mailing list
> tracecompass-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev

_______________________________________________
tracecompass-dev mailing list
tracecompass-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/tracecompass-dev


_______________________________________________
tracecompass-dev mailing list
tracecompass-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/tracecompass-dev


Back to the top