I'm writing this post to share a pitfall with the performance of syntactic predicates that I experienced. Hopefully this will prove useful if somebody else runs into this issue.
The traditional example for syntactic predicates in Xtext is the dangling else-problem shown by the following ambiguous grammar:
Model:
statement+=IfStatement*;
IfStatement:
'if' condition=Expression 'then' then=Expression
('else' else=Expression)?;
Expression:
IfStatement | {ValueReference} name=ID;
This grammar is taken from a blog post on "Meinte's DSL Blog" that I'm unable to link to. This ambiguity can be resolved via a syntactic predicate on the 'else' clause:
IfStatement:
'if' condition=Expression 'then' then=Expression
(=>'else' else=Expression)?;
This tells Xtext that if it can parse an 'else' it should do so without considering any other alternatives. The thing to avoid is following:
IfStatement:
'if' condition=Expression 'then' then=Expression
=>('else' else=Expression)?;
This tells Xtext that if it can parse an 'else' followed by an expression it should do so without considering any other alternatives. The problem is that Xtext then parses the expression to the right of the 'else' twice, once for the syntactic predicate and once for the true parse. This results in an exponential blowup in run-time for nested if-then-else expressions.
It may seem trivial to avoid this pitfall, but it's less obvious when there is an action. This is where I got caught. Consider if the following was ambiguous:
Expression:
Atom ({BinaryExpression.left=current} op=('+'|'-') right=Atom)*
Now consider where to put the syntactic predicate:
1.
Expression:
Atom =>({BinaryExpression.left=current} op=('+'|'-') right=Atom)*
Bad. This creates exponential blowup.
2.
Expression:
Atom (=> {BinaryExpression.left=current} op=('+'|'-') right=Atom)*
Error. This is not well-formed.
3.
Expression:
Atom ({BinaryExpression.left=current} =>op=('+'|'-') right=Atom)*
Error. The syntactic predicate is masked by the action.
4.
Expression:
Atom (=>({BinaryExpression.left=current} op=('+'|'-')) right=Atom)*
Correct. The scope of the syntactic predicate is limited by the inner parenthesis.
From this I've learned that it's very important to consider what exactly a syntactic predicate applies to in order to avoid any limit double parsing. Fixing this issue dropped the parse time on one of our models from 150 seconds down to 4 seconds.
-Andrew