Hi Hana,
> Do you mean that if I use "_" the condition will be applied anyway, but the Aadl2AadlTraceSpec will
not be enumerated or what ? Could you please explain what is happening in the engine side.
If you have two constraints like this:
find foo (x,y1,z1);
find foo (x,y2,z2);
find bar(y1,y2,z1,z2); // …
Then the query engine must ensure that there are matches of called pattern foo such that their first parameter is the same.
On the other hand, if you write:
find foo (_,y1,z1);
find foo (_,y2,z2);
find bar(y1,y2,z1,z2); // …
Then it is equivalent to:
find foo (_x1, y1,z1);
find foo (_x2, y2,z2);
find bar(y1,y2,z1,z2); // …
In other words, the query engine does not need to check whether the matches of foo coincide on the first parameter.
In your specific case, the pattern matcher establishes the traceability relationship between variables * and *ref in various cases, but it does not have to check whether all these traceability relationships come from
the same Aadl2AadlTraceSpec
root.
I hope this helps somewhat.
So where does the huge performance difference come from? Why is this check extra so expensive? Normally, it should not be.
But it is possible that in this specific situation, something misleads the query execution planner of Viatra into coming up with a really, exceptionally bad query evaluation plan.
Perhaps it starts with enumerating all
Aadl2AadlTraceSpec traceability root nodes (so far so good, there is only one after all); but then continues by attempting to enumerate all the myriad combinations, for all variables *,
of matches find
is_in_trace(aadl2aadlref,
trace, *, *ref)incident on this root node; and finally, only after finding all these matches, double-checking whether the values of variables * so found just happen to satisfy all the other lines in the pattern. This is super inefficient, since
there is a combinatorical explosion of is_in_trace matches joined together many times.
I cannot exactly tell you why the query planner stumbles upon this (or another similarly bad) plan, but generally if you tell the engine about domain knowledge you have regarding the functional dependencies, it has better chances of coming
up with a good plan.
Cheers,
Gábor
Thank you for all these solutions, I think option A is the right option for me:
Question A: do you often have multiple
Aadl2AadlTraceSpec
instances (traceability model roots) existing at the same time? Specifically in a way that a given
aadlElement
appears as the leftInstance
in multiple different traceability models? Because if not, if you can guarantee that a given element is only ever the left instance in one traceability model (it might be the right instance of another model, if you transform in multiple steps),
then there is a lot of simplification you can do! In that case, your patterns do not need to include the extra condition that all the *ref elements must be reachable from the left instances via the *same* traceability model (because there is only ever
one traceability model that has those elements as left instances), so instead of
find
is_in_trace(aadl2aadlref,
_,
processsource,
processsourceref), you can simply say
find
is_in_trace(_,
_,
processsource,
processsourceref)
in all rows of the pattern body except one (if you want to keep the traceability root as an argument, you need to leave one constraint to determine this variable). I have a hunch that, if the asumptions is true, then this might be a valid
workaround to your problem!
Currently, I have one step of transformation, so one Aadl2AadlTraceSpec
instance at the same time, when using "_" my non optimized specification works fine too.
In viatra documentation, I found "if you only use a variable once, it is OK not to name it at all; just use a single underscore instead of the variable reference. In fact, each occurrence of this anonymous variable will be treated as a
separate, single-use variable that is distinguished from any other anonymous variable." So I thought that I have the choice to use the name or the "_" and that find
is_in_trace(aadl2aadlref,
trace and find
is_in_trace(_,
_, are similar.
Do you mean that if I use "_" the condition will be applied anyway, but the Aadl2AadlTraceSpec will not be enumerated or
what ? Could you please explain what is happening in the engine side.
Question B: is the pattern is_in_trace typically one-to-one? In the sense that for a single
aadlElement,
do you expect there to be more than one aadlrefElement
and the other way around?
If yes, and additionally for Question A you also answer yes, then altogether the value of
aadlElement
uniquely determines all of the other parameters of the pattern is_in_trace. This is information that you may have (based on external domain knowledge), but cannot be automatically learned from the metamodel! And fortunately, there is a way
to tell Viatra about this domain knowledge! You can add a @FunctionalDependency annotation to the definition of pattern is_in_trace, something like this:
@FunctionalDependency(forEach=aadlElement, unique=aadlrefElement, unique=trace, unique=aadl2aadlref)
pattern is_in_trace(…
See
https://www.eclipse.org/viatra/documentation/query-language.html#_manually_specifying_dependencies_since_v1_5
(Note that if the “traces” is a containment reference and “rightInstance” has multiplicity to-one, then Viatra is smart enough to figure out that “trace” uniquely
determines aadlrefElement and aadl2aadlref, but it has no was to know that leftInstance uniquely determines “trace”; really this is the only functional dpenedency rule missing, the rest can be inferred.)
I have the hunch, although no full conviction, that with these annotations, the query plan optimizer will be smart enough to figure out a query plan that does
not blow up, even without you having to do the changes (underscores instead of TraceSpec) indicated above in Question A.
In case the above functional dependencies do not always hold, just for certain element types, then you can perhaps subdivide is_in_trace into multiple patterns
based on the type of the leftInstance, and have different sets of functional dependencies declared for them; be careful to always use the appropriate one from transformation rules.
Well, in my case I have these types of traces:
1) A-to-Aref
2) B-to-Bref0, B-to-Bref1 , B-to-Bref2, etc.
3) A-to-Aref, B-to-Aref, C-to-Aref, etc.
So they are not all one-to-one traces.
If none of the tricks above work:
Question C: could you please tell me how much heap memory you grant to the JVM process that runs the transformation? For instance, -Xmx8G?
Question D: could you please provide a query profile evaluated at a representative / typical state of the model?
Again, it would be very helpful if you could pinpoint which of the (pre-optimization) queries blew up in memory, so that I could take a look at that specific
pattern.
This requires an execution of either the relative or the absolute query performance test (linked earlier), but in a saved snapshot of the EMF model where the
source model, the target model and the traceability model already have contents of a representative size. When you think you have a triplet of instance models that satisfy these criteria, please either save them to a file (or three files) and then run the
query profiling on that file, or run the query profiler on-the-fly from the same process where you run your transformation. If you cannot easily produce such a state when transforming your largest input model, perhaps a slightly smaller input model will allow
the un-optimized transformation to run to completion, and still generate a reasonably large source+target+traceability model triplet that you can use for profiling.
I think we no longer need this solution as the transformation is scaling now.
Good luck with you project, and wish you all the best with the paper!
Thank you for your reply.
1.
About your earlier question regarding the omission of parameters from the big pattern: while I am not familiar with your exact transformation needs, usually traceability connections between the source and target model are (almost) one-to-one.
This means whether or not they are included in the pattern *should* make little difference in performance. At least that is my expectation. But ultimately, only actual measurements can tell the truth. (Note that if you have one source/destination element
traced to many such “sourceref”/”destinationref” elements, then what I said before does not apply; they may really make a difference in that case)
2.
I have taken a look at the results.txt report that you have attached.
I see that you received an OutOfMemory error… how many gigabytes of Java heap space did you allow the JVM process to use (-Xmx command line switch)?
Remember that if you use the incremental pattern matcher of Viatra, it will maintain both the query results as well as internal auxiliary caches, both of which will take up additional memory on top of the memory required to store the model
itself. You have to make sure that Viatra has plenty of heap space to go around; running it with whatever default the JVM or Eclipse chooses is often not enough.
I am using the is_in_trace calls in patterns, to rapidly find the correspondences between AADL element (input model) and AADL refined element (output model) . These calls add
two conditions: (1) ensure that the objects to be transformed do indeed exist in the AADL input model, since we have the same meta-model for input and output models, if I ask for a ComponentInstance to match, the engine does not differentiate input or output
models, so it can return a ComponentInstance from the output model; (2) also ensure that some objects have already been transformed (since they are added in the trace model), for example the source and destination features should be transformed before the
transformation of a port connection.
I eliminated 8 is_in_trace calls, not all the calls because I need some of them to ensure (1). Now the transformation works fine and I no longer have the OutOfMemory problem even for 512 port connections (specifications are here
[1]).
I had to carefully assign the priorities of the rules to ensure (2), otherwise I get exceptions (because, for example, the source and destination features are not already transformed,
when applying the rule of the port connection).
Do you propose other solutions to ensure (2).
Could you please explain why the calls of is_in_trace are too expensive in memory when they are used in the queries.
5.
Also, I took a *very* cursory look at your repo (specifically at [3]), and found that you seem to have two VQL files, one with pattern called find_*, the other with copy_*, however these seem to be the same patterns, just under different
names? Or at least I did not manage to find any difference by a single glance.
Why have two duplicate copies of the same pattern? Why not just use the same pattern twice? If you have two different pattern definitions with the same content, and you use both of them over the same model at the same time, you pay twice
the memory cost!
There is no use of two different pattern definitions with the same content over the same model at the same time. I used two packages to work on the copying and refinement model
transformations separately. They have some common patterns but they are applied in separate tests on different models and not at the same time.
sorry, but for clarification, I would like to ask whether you selected these patterns by intuition or by some kind of measurements. I am asking this as the Rete network structure used by VIATRA has some very unusual memory characteristics (e.g. a pattern with
more constraints might be more efficient to evaluate than another one with a clear subset of parameters). The measurements in the QueryPerformanceTests (point 2 of my original mail) would provide that information.
OK, this is the Performance test results:
pattern, sequence, matches count, heap before (kb), heap after (kb), used heap (kb), elapsed (ms)
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_component, 10, 0, 39227, 39354, 127, 17
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_connectionref, 9, 0, 39226, 39342, 116, 14
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_feature, 6, 0, 39224, 39631, 407, 75
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_otherconnection, 3, 0, 39214, 39478, 264, 74
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_otherconnection_system, 2, 0, 39207, 39496, 289, 100
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_portconnection_process, 1, 0, 39198, 39702, 504, 209
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_portconnection_system, 5, 0, 39224, 39976, 752, 281
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_subcomponent, 4, 0, 39223, 39415, 192, 29
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.find_system, 8, 1, 39226, 39272, 46, 5
fr.tpt.mem4csd.mtbench.aadl2aadl.viatra.is_in_trace, 7, 0, 39225, 39290, 65, 8
And you can find the total analysis in the attached file.
Furthermore, it is unclear to me how the modified pattern relates to the issue (some more detail would be helpful).
Well, I thought that the problem is caused by the number of parameters, so my idea was to eliminate some parameters in the find_otherconnection_system/find_otherconnection_process patterns.
Hana.
_______________________________________________
viatra-dev mailing list
viatra-dev@xxxxxxxxxxx
To unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/viatra-dev
--
Hana MKAOUAR
PostDoc, Telecom Paris
_______________________________________________
viatra-dev mailing list
viatra-dev@xxxxxxxxxxx
To unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/viatra-dev
--
Hana MKAOUAR
PostDoc, Telecom Paris
|