Regular Expression language [message #1190735] |
Sat, 16 November 2013 19:24 |
Alan DW Messages: 119 Registered: March 2012 |
Senior Member |
|
|
Hello everyone,
I'm trying to formulate an Xtext grammar which will accept very simple regular expressions.
There are in total 5 concepts which should be supported (listed in descending precedence order):
- Atom single character, a-Z or 0-9
- Braces of the form '(' expression ')'
- Stars in postfix notation: expression '*' (multiple stars allowed)
- Sequences by juxtaposition of the expressions above
- Disjunctions of the form expression '+' expression
So, the precedence order is:
+ < juxtaposition < * < ()
All operators are left-associative.
Examples for valid regular expressions:
abc (should transform into seq(seq(a,b),c) )
a+b* (should transform into or(a, star(b)) )
(a+b)* (should transform into star(or(a,b)) )
ab*c (should transform into seq(seq(a, star(b), c) )
a(b+c)**d (should transform into seq(star(or(b,c)),d) )
I've tried to formulate the grammar like this:
RegularExpression:
OrExpression
;
OrExpression:
SequenceExpression ({OrExpression.left = current} '+' right=OrExpression)*
;
SequenceExpression:
HighBindExpression({SequenceExpression.left = current} right = SequenceExpression)*
// note the abscence of a syntactic separating symbol -> juxtaposition!
;
HighBindExpression:
AtomicExpression | StarExpression
;
AtomicExpression:
BracedExpression | Atom
;
StarExpression:
innerExpression = AtomicExpression '*'('*')*
// note: we ignore all stars after the first, since a* = a**, but
// syntactically it should still be valid
;
BracedExpression:
'(' innerExpression = RegularExpression ')'
;
Atom:
value = CHAR
;
terminal CHAR returns ecore::EChar:
('a'..'z'|'A'..'Z'|'0'..'9')
;
The problem is that Xtext (or rather ANTLR in this case) keeps telling me that the grammar is left recursive.
I believe that I've paid attention in my compiler construction class and I know about LL* grammars, but I really can't spot the left recursion here - at least none that would not encounter a syntactic symbol (e.g. an opening brace) on the way, breaking the recursion. I've tried it on paper to start with the root expression and replacing the leftmost rule with its body, but I don't see any left recursions here.
I think the problem is the juxtaposition of expressions which confuses ANTLR, but I'm not sure...
I'd be grateful for any help!
Thanks,
Alan
EDIT: I've verified that the juxtaposition is indeed the source of the problem - if a syntactic symbol (e.g. an exclamation mark) is inserted between the two arguments of the sequence, then it works. But is there a way to get the juxtaposition working?
[Updated on: Sat, 16 November 2013 19:34] Report message to a moderator
|
|
|
Re: Regular Expression language [message #1190817 is a reply to message #1190735] |
Sat, 16 November 2013 20:16 |
|
Hi,
the following grammar seems to work
grammar org.xtext.example.mydsl4.MyDsl hidden(WS)
generate myDsl "http://www.xtext.org/example/mydsl4/MyDsl"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
RegularExpression:
OrExpression
;
OrExpression:
SequenceExpression ({OrExpression.left = current} '+' right=SequenceExpression)*
;
SequenceExpression:
HighBindExpression({SequenceExpression.left = current} right = HighBindExpression)*
// note the abscence of a syntactic separating symbol -> juxtaposition!
;
HighBindExpression:
AtomicExpression ('*'{StarExpression.innerExpression=current} '*'* )?
;
AtomicExpression:
BracedExpression | Atom
;
BracedExpression:
'(' innerExpression = RegularExpression ')'
;
Atom:
value = CHAR
;
terminal CHAR returns ecore::EChar:
('a'..'z'|'A'..'Z'|'0'..'9')
;
terminal WS : (' '|'\t'|'\r'|'\n')+;
terminal ANY_OTHER: .;
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
|
|
|
|
Powered by
FUDForum. Page generated in 0.02061 seconds