Re: [tracecompass-dev] CallGraphAnalysis optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [tracecompass-dev] CallGraphAnalysis optimization

From: Rocky Dunlap - NOAA Affiliate <rocky.dunlap@xxxxxxxx>
Date: Fri, 15 Dec 2017 10:29:20 -0700
Delivered-to: tracecompass-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/tracecompass-dev>
List-help: <mailto:tracecompass-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/tracecompass-dev>, <mailto:tracecompass-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/tracecompass-dev>, <mailto:tracecompass-dev-request@eclipse.org?subject=unsubscribe>

Just wanted to follow up on this - I tried a couple of things to improve performance of the CallGraphAnalysis and I wanted to at least report on those findings, although I do not have code to contribute right now. ;(

My first approach was to add a worker thread pool to the CallGraphAnalysis. This code segment below will give you the picture. It basically changes CallGraphAnalysis.iterateOverStateSystem() into a multi-threaded thing, with a worker pool of 16 Java threads.

ExecutorService executor = Executors.newFixedThreadPool(16);

List<Future<Boolean>> results = new ArrayList<>();

for (int processQuark : processQuarks) {

int processId = getProcessId(ss, processQuark, ss.getCurrentEndTime());

for (int threadQuark : ss.getQuarks(processQuark, threadsPattern)) {

Future<Boolean> result = executor.submit(new Callable<Boolean>() {

@Override

public Boolean call() throws Exception {

return iterateOverQuarkConcurrent(ss, processId, threadQuark, callStackPath, monitor);

}

});

results.add(result);

}

executor.shutdown();

You get the idea. For my particular case the execution of the CallGraphAnalysis went from ~120 seconds to ~15 seconds - a nice improvement.

Then I tried an approach of performing the CallGraphAnalysis during the CallStackAnalysis as the relevant events are brought into memory. This required changing some of the statistics-related classes, such as AbstractCalledFunction and AggregatedCalledFunction to be built in two separate phases. For example, an instance of AbstractCalledFunction is created during a call stack "enter" event, but is not "completed" until the "exit" event. We cannot determine some statistics, such as duration until after the "exit" event is processed. So, you have to maintain a couple of stacks in the CallStackAnalysis with partial AbstractCalledFunction and AggregatedCalledFunctions. To control memory overhead, nodes are merged in the AggregatedCalledFunction tree as soon as possible since we are mainly just interested in the statistics.

Using this approach, total processing time for the CallGraphAnalysis stuff was brought down to ~1 second for my cas. So, as predicted by Matthew and Geneviève, this seems to be the best approach as far as performance goes.

My code is ugly right now and I did not have the luxury to think about how to do this in a clean, generic way. So I unfortunately cannot push any code right now, but I thought that passing on this knowledge would at least be helpful in the short term.

Rocky

On Thu, Nov 30, 2017 at 8:07 AM, Genevieve Bastien <gbastien+lttng@xxxxxxxxxxxx> wrote:

Rocky, yes 2d query is what I meant, but we would need to keep a
function call in memory until we can associate with its parent which may
come much later in the query, so for very large and deep callstack, this
may very well cause OutOfMemoryException. Unless I'm mistaken. Unless we
can make sorted 2d queries in which case the parent will always pop out
before the child.

But building it at the same time as the callstack (and saving it of
course!) would make more sense, though it would work only with analyses
that know they are callstack (ie _not_ for data-driven callstack, but we
can fallback to current method for that)

Did you plan to work on that? As for me, I'll start consolidating what
has been done in the Incubator's generic callstack, ie improve
performances, UX, etc. so hopefully I might get to that beginning of
2018. But that's only the incubator, I'll not be touching what is in
Trace Compass main repo.

But as you proposed, saving the callgraph in a naive way, maybe by
simple serialization, might be an easy low hanging fruit to improve
performances after the first calculation.

Cheers,

Geneviève

On 2017-11-30 09:14 AM, Matthew Khouzam wrote:
> @Rocky, that is an excellent proposal. It would require more complicated
> in memory operations but it will save a lot in terms of disk access.
> Another possibility is to make the call graph built _IN_ to the
> callstack analysis. This would make sense since we already have the
> interesting data in memory.
>
> I am very excited about these possibilities.
>
> Matthew
>
>
> On 17-11-29 06:27 PM, Rocky Dunlap - NOAA Affiliate wrote:
>> Geneviève,
>>
>> When you say "more powerful queries" - do you think we can use the
>> existing query2D on the StateSystem, e.g., to get a chunk at a time?
>>
>> Rocky
>>
>> On Tue, Nov 28, 2017 at 8:20 AM, Genevieve Bastien
>> <gbastien+lttng@xxxxxxxxxxxx <mailto:gbastien+lttng@versatic.net>> wrote:
>>
>> Hi Rocky,
>>
>>
>> Thanks for investigating this. You are right, the algorithm to
>> iterate the state system callstack is a very naive one, using
>> single queries to parse pretty much the whole state system. While
>> it works and is easy to verify and test, it can be greatly improved.
>>
>>
>> One other step would be to revisit this algorithm and use other
>> more powerful queries instead, but calls would not be visited
>> chronologically so each function will have an unknown parent for a
>> while... So this is easier said than done. But since with
>> callstacks the parent ends after the child, reading the state
>> system from end to beginning could help avoid unknown parents for
>> a while.
>>
>>
>> But if we can do this, then multithreading it would be much
>> easier! State Systems in read mode are [supposed to be] very
>> thread-safe.
>>
>>
>> But indeed the easiest would be to save the callgraph to file.
>> That's something I intended to experiment on in a near future for
>> the "Generic callstack" feature of the incubator. Save it in such
>> a way that it is also easy to produce statistics and flamegraphs
>> for a time selection as well. (Shameless plug, here's a post I
>> just wrote about the data-driven callstack in the incubator,
>> introducing the feature by the same occasion [1]). I was thinking
>> along the lines of saving it to a state system where the
>> attributes represent the symbols and their children are their
>> symbol callees and save the call count for each. Doing a full
>> query at 'begin' and 'end' would allow to easily rebuild the
>> callgraph. But it remains to be tested. If you have other ideas
>> for this, we'd gladly discuss and review!
>>
>>
>> I'm also working on benchmark for this analysis so we have a
>> baseline to compare the different approaches. Benchmarks are on
>> gerrit and for the incubator so far, but I plan to make their
>> equivalent for main Trace Compass.
>>
>>
>> Thanks,
>>
>> Geneviève
>>
>>
>> [1]
>> http://versatic.net/tracecompass/incubator/callstack/2017/11/27/tale-generic-callstack.html
>> <http://versatic.net/tracecompass/incubator/callstack/2017/11/27/tale-generic-callstack.html>
>>
>>
>> On 2017-11-27 10:36 PM, Rocky Dunlap - NOAA Affiliate wrote:
>>> Our traces tend to have a lot of threads and generate large call
>>> stacks. The CallGraphAnalysis is taking a bit longer to run that
>>> I would prefer, so I've done some very preliminary profiling.
>>> Looks like almost all of the time is spent querying the state system:
>>>
>>> Inline image 1
>>>
>>> I have some options in mind, but before diving into anything, I
>>> thought I would get insight from this group about what is likely
>>> going to be the best approach. Some ideas for your consideration:
>>>
>>> 1. Could the CallGraphAnalysis state be written out to file
>>> instead of rebuilding it each time?
>>>
>>> 2. It is currently single threaded. What is the concurrency
>>> policy for StateSystem? If more worker threads were used, what
>>> is the likelihood of speeding this up vs. just having a lot of
>>> contention on the back end? Are the StateSystems thread safe?
>>>
>>> 3. What is the indexing strategy, if any, and could that be
>>> improved for the kinds of queries issued by the CallGraphAnalysis?
>>>
>>> I think these are roughly in the order of easiest to hardest.
>>> There are probably other options as well. Doing 1 at a minimum
>>> seems not too bad. Thoughts?
>>>
>>> Thanks,
>>> Rocky
>>>
>>>
>>>
>>> _______________________________________________
>>> tracecompass-dev mailing list
>>> tracecompass-dev@xxxxxxxxxxx <mailto:tracecompass-dev@eclipse.org>
>>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>>> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev
>>> <https://dev.eclipse.org/mailman/listinfo/tracecompass-dev>
>>
>> _______________________________________________
>> tracecompass-dev mailing list
>> tracecompass-dev@xxxxxxxxxxx <mailto:tracecompass-dev@eclipse.org>
>> To change your delivery options, retrieve your password, or
>> unsubscribe from this list, visit
>> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev
>> <https://dev.eclipse.org/mailman/listinfo/tracecompass-dev>
>>
>>
>>
>>
>> _______________________________________________
>> tracecompass-dev mailing list
>> tracecompass-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
>> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev
> _______________________________________________
> tracecompass-dev mailing list
> tracecompass-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/tracecompass-dev

_______________________________________________
tracecompass-dev mailing list
tracecompass-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/tracecompass-dev

References:
- [tracecompass-dev] CallGraphAnalysis optimization
  - From: Rocky Dunlap - NOAA Affiliate
- Re: [tracecompass-dev] CallGraphAnalysis optimization
  - From: Genevieve Bastien
- Re: [tracecompass-dev] CallGraphAnalysis optimization
  - From: Rocky Dunlap - NOAA Affiliate
- Re: [tracecompass-dev] CallGraphAnalysis optimization
  - From: Matthew Khouzam
- Re: [tracecompass-dev] CallGraphAnalysis optimization
  - From: Genevieve Bastien

Prev by Date: Re: [tracecompass-dev] Separating ITmfTrace from the UI
Next by Date: [tracecompass-dev] Trace Compass 3.2.0 is released
Previous by thread: Re: [tracecompass-dev] CallGraphAnalysis optimization
Next by thread: [tracecompass-dev] Trace Compass 3.2.0/Oxygen.2 RC3 posted
Index(es):
- Date
- Thread

Breadcrumbs