Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Regular Expression language
Regular Expression language [message #1190735] Sat, 16 November 2013 19:24 Go to next message
Alan DW is currently offline Alan DWFriend
Messages: 108
Registered: March 2012
Senior Member
Hello everyone,

I'm trying to formulate an Xtext grammar which will accept very simple regular expressions.

There are in total 5 concepts which should be supported (listed in descending precedence order):



  • Atom single character, a-Z or 0-9
  • Braces of the form '(' expression ')'
  • Stars in postfix notation: expression '*' (multiple stars allowed)
  • Sequences by juxtaposition of the expressions above
  • Disjunctions of the form expression '+' expression


So, the precedence order is:

+ < juxtaposition < * < ()

All operators are left-associative.


Examples for valid regular expressions:

abc (should transform into seq(seq(a,b),c) )
a+b* (should transform into or(a, star(b)) )
(a+b)* (should transform into star(or(a,b)) )
ab*c (should transform into seq(seq(a, star(b), c) )
a(b+c)**d (should transform into seq(star(or(b,c)),d) )


I've tried to formulate the grammar like this:


RegularExpression:
	OrExpression
;

OrExpression:
	SequenceExpression ({OrExpression.left = current} '+' right=OrExpression)*
;

SequenceExpression:
	HighBindExpression({SequenceExpression.left = current} right = SequenceExpression)*
        // note the abscence of a syntactic separating symbol -> juxtaposition!
;

HighBindExpression:
        AtomicExpression | StarExpression
;

AtomicExpression:
	BracedExpression | Atom
;

StarExpression:
	innerExpression = AtomicExpression '*'('*')*
        // note: we ignore all stars after the first, since a* = a**, but
        // syntactically it should still be valid
;

BracedExpression:
	'(' innerExpression = RegularExpression ')'
;

Atom:
	value = CHAR
;

terminal CHAR returns ecore::EChar:
	('a'..'z'|'A'..'Z'|'0'..'9')
;



The problem is that Xtext (or rather ANTLR in this case) keeps telling me that the grammar is left recursive.

I believe that I've paid attention in my compiler construction class and I know about LL* grammars, but I really can't spot the left recursion here - at least none that would not encounter a syntactic symbol (e.g. an opening brace) on the way, breaking the recursion. I've tried it on paper to start with the root expression and replacing the leftmost rule with its body, but I don't see any left recursions here.

I think the problem is the juxtaposition of expressions which confuses ANTLR, but I'm not sure...

I'd be grateful for any help!


Thanks,


Alan



EDIT: I've verified that the juxtaposition is indeed the source of the problem - if a syntactic symbol (e.g. an exclamation mark) is inserted between the two arguments of the sequence, then it works. But is there a way to get the juxtaposition working?

[Updated on: Sat, 16 November 2013 19:34]

Report message to a moderator

Re: Regular Expression language [message #1190817 is a reply to message #1190735] Sat, 16 November 2013 20:16 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 6483
Registered: July 2009
Senior Member
Hi,


the following grammar seems to work

grammar org.xtext.example.mydsl4.MyDsl hidden(WS)

generate myDsl "http://www.xtext.org/example/mydsl4/MyDsl"

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

RegularExpression:
	OrExpression
;

OrExpression:
	SequenceExpression ({OrExpression.left = current} '+' right=SequenceExpression)*
;

SequenceExpression:
	HighBindExpression({SequenceExpression.left = current} right = HighBindExpression)*
        // note the abscence of a syntactic separating symbol -> juxtaposition!
;

HighBindExpression:
        AtomicExpression ('*'{StarExpression.innerExpression=current} '*'* )?
;

AtomicExpression:
	BracedExpression | Atom
;

BracedExpression:
	'(' innerExpression = RegularExpression ')'
;

Atom:
	value = CHAR
;

terminal CHAR returns ecore::EChar:
	('a'..'z'|'A'..'Z'|'0'..'9')
;

terminal WS			: (' '|'\t'|'\r'|'\n')+;

terminal ANY_OTHER: .;
Re: Regular Expression language [message #1190846 is a reply to message #1190817] Sat, 16 November 2013 20:36 Go to previous message
Alan DW is currently offline Alan DWFriend
Messages: 108
Registered: March 2012
Senior Member
Hi Christian,

thanks for your response! Your version of the "BracedExpression" is quite a beast. I see how it works, but I probably would never have come up with this myself Smile Guess that I still have a lot to learn.

Thank you,


Alan

[Updated on: Sat, 16 November 2013 23:09]

Report message to a moderator

Previous Topic:Context sensitive lexing
Next Topic:End of line grammar difficulty [SOLVED]
Goto Forum:
  


Current Time: Sat Nov 22 10:32:12 GMT 2014

Powered by FUDForum. Page generated in 0.06041 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software