Just wanted to follow up on this - I tried a couple of things to improve performance of the CallGraphAnalysis and I wanted to at least report on those findings, although I do not have code to contribute right now. ;(
My first approach was to add a worker thread pool to the CallGraphAnalysis. This code segment below will give you the picture. It basically changes CallGraphAnalysis.iterateOverStateSystem() into a multi-threaded thing, with a worker pool of 16 Java threads.
ExecutorService executor = Executors.newFixedThreadPool(16);
List<Future<Boolean>> results = new ArrayList<>();
for (int processQuark : processQuarks) {
int processId = getProcessId(ss, processQuark, ss.getCurrentEndTime());
for (int threadQuark : ss.getQuarks(processQuark, threadsPattern)) {
Future<Boolean> result = executor.submit(new Callable<Boolean>() {
@Override
public Boolean call() throws Exception {
return iterateOverQuarkConcurrent(ss, processId, threadQuark, callStackPath, monitor);
}
});
results.add(result);
}
}
executor.shutdown();
You get the idea. For my particular case the execution of the CallGraphAnalysis went from ~120 seconds to ~15 seconds - a nice improvement.
Then I tried an approach of performing the CallGraphAnalysis during the CallStackAnalysis as the relevant events are brought into memory. This required changing some of the statistics-related classes, such as AbstractCalledFunction and AggregatedCalledFunction to be built in two separate phases. For example, an instance of AbstractCalledFunction is created during a call stack "enter" event, but is not "completed" until the "exit" event. We cannot determine some statistics, such as duration until after the "exit" event is processed. So, you have to maintain a couple of stacks in the CallStackAnalysis with partial AbstractCalledFunction and AggregatedCalledFunctions. To control memory overhead, nodes are merged in the AggregatedCalledFunction tree as soon as possible since we are mainly just interested in the statistics.
Using this approach, total processing time for the CallGraphAnalysis stuff was brought down to ~1 second for my cas. So, as predicted by Matthew and Geneviève, this seems to be the best approach as far as performance goes.
My code is ugly right now and I did not have the luxury to think about how to do this in a clean, generic way. So I unfortunately cannot push any code right now, but I thought that passing on this knowledge would at least be helpful in the short term.
Rocky