Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Help parsing 4.1+3., with a '.' at the end
Help parsing 4.1+3., with a '.' at the end [message #1104595] Sun, 08 September 2013 17:00 Go to next message
Scott Hendrickson is currently offline Scott HendricksonFriend
Messages: 22
Registered: December 2009
Junior Member
I have a DSL where expressions have to end with a '.'. Among other things, it is supposed to allow simple mathematical expressions such as "4.1+3." which should be interpreted as "4.1 + 3.0 .". However, I cannot seem to parse this correctly. I've created a simple xtext file to illustrate, as follows:

grammar xtext.e.EDsl hidden(WHITESPACE)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate eDsl "http://www.e.xtext/EDsl"

// based on example at: http://blog.efftinge.de/2010/08/parsing-expressions-with-xtext.html
Model:
	expression=Multiply '.';

terminal WHITESPACE:
	(' ' | '\t' | '\r' | '\n')+;

terminal VARIABLE:
	'a'..'z' ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*;

terminal NUMBER:
	('0'..'9')+ ('.' ('0'..'9')+)?;

Multiply returns Expression:
	Add ({Expression.left=current} op='*' right=Add)*;

Add returns Expression:
	Root ({Expression.left=current} op='+' right=Root)*;

Root returns Expression:
	{Variable} name=VARIABLE
	| {Literal} value=NUMBER;


The error I get when trying to parse "4.1+3." is:

required (...)+ loop did not match anything at character '<EOF>' at offset: 4 for: 4.1+<<<HERE>>>3.


I think it's trying to parse "3." as a number with a decimal, requiring the ('.' ('0'..'9')+)? part, even though it is optional. Any ideas of how I might parse something like "4.1+3." correctly? Any help is greatly appreciated.

(FYI: the actual language I'm trying to parse is Prolog. All Prolog statements must end in a '.', and it allows decimals in its numbers. I've attached the full xtext file, if anyone is interested.)

Thank you in advance for any help,
-- Scott
Re: Help parsing 4.1+3., with a '.' at the end [message #1104633 is a reply to message #1104595] Sun, 08 September 2013 18:21 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

lexing is greedy, i.e. tokens are made as long as possible. "3." matches the start of the number token but now additional numbers are expected, hence the error. In this phase parse rules are completely irrelevant, i.e. it does not matter that the dot could come from the Model rule.

You might try, whether turning Number into a datatype rule solves your problem immediately.

terminal INT: ('0'..'9')+;
Number: INT ('.' INT)?

Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: Help parsing 4.1+3., with a '.' at the end [message #1104786 is a reply to message #1104595] Mon, 09 September 2013 00:09 Go to previous messageGo to next message
Scott Hendrickson is currently offline Scott HendricksonFriend
Messages: 22
Registered: December 2009
Junior Member
Thanks Alex,

That works in the example I posted. But, the real xtext file is a bit more complex. With the real terminal NUMBER (see below for the full xtext file), it parses the following input as follows:

terminal NUMBER:
	'-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')? ('0'..'9')+)?;


"X." --> VariableExpression, name=X
"x." --> AtomExpression, atom=x
"3.0." --> NumberExpression, value=3.0
"3.0e1." --> NumberExpression, value=3.0e1

"3." --> Parse exception for: <<<HERE>>>3.
required (...)+ loop did not match anything at character '<EOF>' at offset: 0
"E." --> Parse exception for: <<<HERE>>>E.
required (...)+ loop did not match anything at input 'E' at offset: 0
"e." --> Parse exception for: <<<HERE>>>e.
required (...)+ loop did not match anything at input 'e' at offset: 0

The "E." and "e." should be parsing like the "X." and "x.". But, somehow, the terminal NUMBER is getting in the way, even though there are no digits in the input. I can verify this by changin the "('e' | 'E')" in NUMBER to "('x' | 'X')" and observing that the corresponding "X." and "x." inputs break and the "E." and "e." inputs succeed.

With the modified NUMBER rule, I get the following:

terminal INT:
	'0'..'9';

NUMBER:
	'-'? INT+ ('.' INT+)? (('e' | 'E') ('-' | '+')? INT+)?;


"X." --> VariableExpression, name=X
"x." --> AtomExpression, atom=x
"3.0." --> NumberExpression, value=3.0
"3." --> NumberExpression, value=3 (now this works!)

"3.0e1." --> Parse exception for: 3.0<<<HERE>>>e1. (now this is broken!)
extraneous input 'e1' expecting '.' at offset: 3 for: 3.0<<<HERE>>>e1.
"E." --> Parse exception for: <<<HERE>>>E.
required (...)+ loop did not match anything at input 'E' at offset: 0
"e." --> Parse exception for: <<<HERE>>>e.
required (...)+ loop did not match anything at input 'e' at offset: 0

The terminal NUMBER is still getting in the way of "E." and "e.", "3." is fixed, but now "3.0e1" is broken.

I suppose that I'm confused in general because either the "?" in the terminal NUMBER rule isn't working like I understand, or there's a bug somewhere.

The full xtext file is below.

Thank you,
-- Scott

grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE, SINGLE_LINE_COMMENT)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"

Model:
	(exps+=ExpressionInfinity '.')+;

terminal WHITESPACE:
	(' ' | '\t' | '\r' | '\n')+;

terminal SINGLE_LINE_COMMENT:
	'%' !('\n' | '\r')* ('\r'? '\n')?;

terminal STRING:
	'\''->'\'';

terminal NUMBER:
	'-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')? ('0'..'9')+)?;

terminal VARIABLE:
	('A'..'Z' | '_') ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*;

	// http://www.cse.unsw.edu.au/~billw/cs9414/notes/prolog/op.html
terminal OP1200XFX:
	'-->' | ':-';

terminal OP1200FX:
	':-' | '?-';

terminal OP1150FX:
	'dynamic' | 'discontiguous' | 'initialization' | 'module_transparent' | 'multifile' | 'thread_local' | 'volatile';

terminal OP1100XFY:
	';' | '|';

terminal OP1050XFY:
	'->' | '*->';

terminal OP1000XFY:
	',';

terminal OP954XFY:
	'\\';

terminal OP900FY:
	'\\+';

terminal OP900FX:
	'~';

terminal OP700XFX:
	'<' | '=' | '=..' | '=@=' | '=:=' | '=<' | '==' | '=\\=' | '>' | '>=' | '@<' | '@=<' | '@>' | '@>=' | '\\=' | '\\=='
	| 'is';

terminal OP600XFY:
	':';

terminal OP500YFX:
	'+' | '-' | '/\\' | '\\/' | 'xor';

terminal OP500FX:
	'+' | '-' | '?' | '\\';

terminal OP400YFX:
	'*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem';

terminal OP200XFX:
	'**';

terminal OP200XFY:
	'^';

terminal ATOM:
	'a'..'z' ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*;

ATOMS:
	'.' | '!' | ATOM | OP1200XFX | OP1200FX | OP1150FX | OP1100XFY | OP1050XFY | OP1000XFY | OP954XFY | OP900FY | OP900FX
	| OP700XFX | OP600XFY | OP500YFX | OP500FX | OP400YFX | OP200XFX | OP200XFY;

	// http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/4.html
// xfx infix nonassociative 
// xfy infix right-associative 
// yfx infix left-associative 
// fx prefix nonassociative 
// fy prefix right-associative 
// xf postfix nonassociative 
// yf postfix left-associative
ExpressionInfinity returns Expression:
	Expression1200xfx;

Expression1200xfx returns Expression:
	Expression1200fx ({Expression.left=current} op=OP1200XFX right=Expression1200fx)?;

Expression1200fx returns UnaryExpression:
	(op=OP1200FX)? right=Expression1150fx;

Expression1150fx returns UnaryExpression:
	(op=OP1150FX)? right=Expression1100xfy;

Expression1100xfy returns Expression:
	Expression1050xfy ({Expression.left=current} op=OP1100XFY right=Expression1100xfy)?;

Expression1050xfy returns Expression:
	Expression1000xfy ({Expression.left=current} op=OP1050XFY right=Expression1050xfy)?;

Expression1000xfy returns Expression:
	Expression954xfy ({Expression.left=current} op=OP1000XFY right=Expression1000xfy)?;

Expression954xfy returns Expression:
	Expression900fy ({Expression.left=current} op=OP954XFY right=Expression954xfy)?;

	// TODO: Determine how to do right-associative UnaryExpressions, if necessary
Expression900fy returns UnaryExpression:
	(op=OP900FY)? right=Expression900fx;

Expression900fx returns UnaryExpression:
	(op=OP900FX)? right=Expression700xfx;

Expression700xfx returns Expression:
	Expression600xfy ({Expression.left=current} op=OP700XFX right=Expression600xfy)?;

Expression600xfy returns Expression:
	Expression500yfx ({Expression.left=current} op=OP600XFY right=Expression600xfy)?;

Expression500yfx returns Expression:
	Expression500fx ({Expression.left=current} op=OP500YFX right=Expression500fx)*;

Expression500fx returns UnaryExpression:
	(op=OP500FX)? right=Expression400yfx;

Expression400yfx returns Expression:
	Expression200xfx ({Expression.left=current} op=OP400YFX right=Expression200xfx)*;

Expression200xfx returns Expression:
	Expression200xfy ({Expression.left=current} op=OP200XFX right=Expression200xfy)?;

Expression200xfy returns Expression:
	Expression0 ({Expression.left=current} op=OP200XFY right=Expression200xfy)?;

Expression0 returns Expression:
	{AtomExpression} atom=ATOMS ('(' terms=ExpressionInfinity ')')?
	| {VariableExpression} name=VARIABLE
	| {StringExpression} value=STRING
	| {NumberExpression} value=NUMBER
	| {ListExpression} '[' (head=ExpressionInfinity ('|' tail=ExpressionInfinity)?)? ']'
	| '(' ExpressionInfinity ')';

Re: Help parsing 4.1+3., with a '.' at the end [message #1104867 is a reply to message #1104786] Mon, 09 September 2013 03:23 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
FWiW, this is what I used in one Xtext language.

terminal HEX : '0' ('x'|'X')(('0'..'9')|('a'..'f')|('A'..'F'))+ ;
terminal INT : ('0'..'9')+;
REAL hidden(): INT '.' (EXT_INT | INT);
terminal EXT_INT: INT ('e'|'E')('-'|'+') INT;

It is not perfect (does not allow a decimal number that starts or ends
with '.'), but it works otherwise

Note that REAL is a data rule, not a terminal. Also note that negative
numbers are handled with an unary '-' operator, not as part of the terminal.

The language also has Data rules with terminal conversion. This since
the terminals are also included in other rules where the text is wanted.

Hope that is of some value even if not solving all of your issues.

- henrik

On 2013-09-09 2:09, Scott Hendrickson wrote:
> Thanks Alex,
>
> That works in the example I posted. But, the real xtext file is a bit
> more complex. With the real terminal NUMBER (see below for the full
> xtext file), it parses the following input as follows:
>
>
> terminal NUMBER:
> '-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')?
> ('0'..'9')+)?;
>
>
> "X." --> VariableExpression, name=X
> "x." --> AtomExpression, atom=x
> "3.0." --> NumberExpression, value=3.0
> "3.0e1." --> NumberExpression, value=3.0e1
>
> "3." --> Parse exception for: <<<HERE>>>3.
> required (...)+ loop did not match anything at character
> '<EOF>' at offset: 0
> "E." --> Parse exception for: <<<HERE>>>E. required (...)+ loop
> did not match anything at input 'E' at offset: 0
> "e." --> Parse exception for: <<<HERE>>>e.
> required (...)+ loop did not match anything at input 'e' at
> offset: 0
>
> The "E." and "e." should be parsing like the "X." and "x.". But,
> somehow, the terminal NUMBER is getting in the way, even though there
> are no digits in the input. I can verify this by changin the "('e' |
> 'E')" in NUMBER to "('x' | 'X')" and observing that the corresponding
> "X." and "x." inputs break and the "E." and "e." inputs succeed.
>
> With the modified NUMBER rule, I get the following:
>
>
> terminal INT:
> '0'..'9';
>
> NUMBER:
> '-'? INT+ ('.' INT+)? (('e' | 'E') ('-' | '+')? INT+)?;
>
>
> "X." --> VariableExpression, name=X
> "x." --> AtomExpression, atom=x
> "3.0." --> NumberExpression, value=3.0
> "3." --> NumberExpression, value=3 (now this works!)
>
> "3.0e1." --> Parse exception for: 3.0<<<HERE>>>e1. (now this is broken!)
> extraneous input 'e1' expecting '.' at offset: 3 for:
> 3.0<<<HERE>>>e1.
> "E." --> Parse exception for: <<<HERE>>>E. required (...)+ loop
> did not match anything at input 'E' at offset: 0
> "e." --> Parse exception for: <<<HERE>>>e.
> required (...)+ loop did not match anything at input 'e' at
> offset: 0
>
> The terminal NUMBER is still getting in the way of "E." and "e.", "3."
> is fixed, but now "3.0e1" is broken.
>
> I suppose that I'm confused in general because either the "?" in the
> terminal NUMBER rule isn't working like I understand, or there's a bug
> somewhere.
>
> The full xtext file is below.
>
> Thank you,
> -- Scott
>
>
> grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE,
> SINGLE_LINE_COMMENT)
>
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"
>
> Model:
> (exps+=ExpressionInfinity '.')+;
>
> terminal WHITESPACE:
> (' ' | '\t' | '\r' | '\n')+;
>
> terminal SINGLE_LINE_COMMENT:
> '%' !('\n' | '\r')* ('\r'? '\n')?;
>
> terminal STRING:
> '\''->'\'';
>
> terminal NUMBER:
> '-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')?
> ('0'..'9')+)?;
>
> terminal VARIABLE:
> ('A'..'Z' | '_') ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*;
>
> // http://www.cse.unsw.edu.au/~billw/cs9414/notes/prolog/op.html
> terminal OP1200XFX:
> '-->' | ':-';
>
> terminal OP1200FX:
> ':-' | '?-';
>
> terminal OP1150FX:
> 'dynamic' | 'discontiguous' | 'initialization' |
> 'module_transparent' | 'multifile' | 'thread_local' | 'volatile';
>
> terminal OP1100XFY:
> ';' | '|';
>
> terminal OP1050XFY:
> '->' | '*->';
>
> terminal OP1000XFY:
> ',';
>
> terminal OP954XFY:
> '\\';
>
> terminal OP900FY:
> '\\+';
>
> terminal OP900FX:
> '~';
>
> terminal OP700XFX:
> '<' | '=' | '=..' | '=@=' | '=:=' | '=<' | '==' | '=\\=' | '>' |
> '>=' | '@<' | '@=<' | '@>' | '@>=' | '\\=' | '\\=='
> | 'is';
>
> terminal OP600XFY:
> ':';
>
> terminal OP500YFX:
> '+' | '-' | '/\\' | '\\/' | 'xor';
>
> terminal OP500FX:
> '+' | '-' | '?' | '\\';
>
> terminal OP400YFX:
> '*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem';
>
> terminal OP200XFX:
> '**';
>
> terminal OP200XFY:
> '^';
>
> terminal ATOM:
> 'a'..'z' ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')*;
>
> ATOMS:
> '.' | '!' | ATOM | OP1200XFX | OP1200FX | OP1150FX | OP1100XFY |
> OP1050XFY | OP1000XFY | OP954XFY | OP900FY | OP900FX
> | OP700XFX | OP600XFY | OP500YFX | OP500FX | OP400YFX | OP200XFX |
> OP200XFY;
>
> // http://www.csupomona.edu/~jrfisher/www/prolog_tutorial/4.html
> // xfx infix nonassociative // xfy infix right-associative // yfx infix
> left-associative // fx prefix nonassociative // fy prefix
> right-associative // xf postfix nonassociative // yf postfix
> left-associative
> ExpressionInfinity returns Expression:
> Expression1200xfx;
>
> Expression1200xfx returns Expression:
> Expression1200fx ({Expression.left=current} op=OP1200XFX
> right=Expression1200fx)?;
>
> Expression1200fx returns UnaryExpression:
> (op=OP1200FX)? right=Expression1150fx;
>
> Expression1150fx returns UnaryExpression:
> (op=OP1150FX)? right=Expression1100xfy;
>
> Expression1100xfy returns Expression:
> Expression1050xfy ({Expression.left=current} op=OP1100XFY
> right=Expression1100xfy)?;
>
> Expression1050xfy returns Expression:
> Expression1000xfy ({Expression.left=current} op=OP1050XFY
> right=Expression1050xfy)?;
>
> Expression1000xfy returns Expression:
> Expression954xfy ({Expression.left=current} op=OP1000XFY
> right=Expression1000xfy)?;
>
> Expression954xfy returns Expression:
> Expression900fy ({Expression.left=current} op=OP954XFY
> right=Expression954xfy)?;
>
> // TODO: Determine how to do right-associative UnaryExpressions, if
> necessary
> Expression900fy returns UnaryExpression:
> (op=OP900FY)? right=Expression900fx;
>
> Expression900fx returns UnaryExpression:
> (op=OP900FX)? right=Expression700xfx;
>
> Expression700xfx returns Expression:
> Expression600xfy ({Expression.left=current} op=OP700XFX
> right=Expression600xfy)?;
>
> Expression600xfy returns Expression:
> Expression500yfx ({Expression.left=current} op=OP600XFY
> right=Expression600xfy)?;
>
> Expression500yfx returns Expression:
> Expression500fx ({Expression.left=current} op=OP500YFX
> right=Expression500fx)*;
>
> Expression500fx returns UnaryExpression:
> (op=OP500FX)? right=Expression400yfx;
>
> Expression400yfx returns Expression:
> Expression200xfx ({Expression.left=current} op=OP400YFX
> right=Expression200xfx)*;
>
> Expression200xfx returns Expression:
> Expression200xfy ({Expression.left=current} op=OP200XFX
> right=Expression200xfy)?;
>
> Expression200xfy returns Expression:
> Expression0 ({Expression.left=current} op=OP200XFY
> right=Expression200xfy)?;
>
> Expression0 returns Expression:
> {AtomExpression} atom=ATOMS ('(' terms=ExpressionInfinity ')')?
> | {VariableExpression} name=VARIABLE
> | {StringExpression} value=STRING
> | {NumberExpression} value=NUMBER
> | {ListExpression} '[' (head=ExpressionInfinity ('|'
> tail=ExpressionInfinity)?)? ']'
> | '(' ExpressionInfinity ')';
>
>
Re: Help parsing 4.1+3., with a '.' at the end [message #1105193 is a reply to message #1104867] Mon, 09 September 2013 13:42 Go to previous messageGo to next message
Scott Hendrickson is currently offline Scott HendricksonFriend
Messages: 22
Registered: December 2009
Junior Member
Thanks henrik,

For now, I'm just using the following rule...

terminal DIGIT:
	('0'..'9')+;

NUMBER:
	'-'? DIGIT ('.' DIGIT)?;


... and I cannot accept numbers in the scientific notation form. That seems to be a reasonable compromise for the time being.

But, what does "hidden()" in "REAL hidden(): INT '.' (EXT_INT | INT);" mean?

-- Scott
Re: Help parsing 4.1+3., with a '.' at the end [message #1105201 is a reply to message #1104867] Mon, 09 September 2013 13:58 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

you can define hidden terminals which are not passed to the parser (see the first line of your grammar definition). Typically white spaces, comments, etc. are hidden. Otherwise you would have to explicitly allow each of them at every position in the rule where they may appear. But you can define hidden terminals per rule as well, not just globally.

hidden() means that nothing is hidden. Your number rule would work for the following input
-<tab>3<newline>.<space><space>24
which you might not want.

Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: Help parsing 4.1+3., with a '.' at the end [message #1105224 is a reply to message #1105201] Mon, 09 September 2013 14:38 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2013-09-09 15:58, Alexander Nittka wrote:
> Hi,
>
> you can define hidden terminals which are not passed to the parser (see
> the first line of your grammar definition). Typically white spaces,
> comments, etc. are hidden. Otherwise you would have to explicitly allow
> each of them at every position in the rule where they may appear. But
> you can define hidden terminals per rule as well, not just globally.
>
> hidden() means that nothing is hidden. Your number rule would work for
> the following input
> -<tab>3<newline>.<space><space>24
> which you might not want.
>

Sorry, but that is actually the other way around ;-) the use of hidden()
means that nothing is hidden from the rule, and thus space, tab etc is
therefore NOT allowed between the digits and punctuation in the REAL
number because these tokens will now be delivered to the parser and
there is no mention of WS etc in the rule itself.

- henrik
Re: Help parsing 4.1+3., with a '.' at the end [message #1105574 is a reply to message #1104595] Tue, 10 September 2013 02:58 Go to previous messageGo to next message
Scott Hendrickson is currently offline Scott HendricksonFriend
Messages: 22
Registered: December 2009
Junior Member
I suppose I have one last question. Taking the original terminal NUMBER rule of:

terminal NUMBER:
	'-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')? ('0'..'9')+)?;


Am I just misunderstanding how the '?' works in terminal definitions? Becuase I was assuming that the "('.' ('0'..'9')+)?" indicates that the whole part inside the parenthesis is optional for a match, just like the "'-'?" indicates that the dash is optional for a match.

I understand the idea that terminal rules are "greedy" in that they match the most text that they can. But, it seems to me that, logically, the rule should only match the "4" part of "4." rather than failing because it requires a digit after the '.'. This seems incorrect in that it is ultimately forcing a match of an optional part -- which makes me wonder what the '?' really means in the first place.

In other words, shouldn't a "(...)?" ultimately be optional, and thus, a partial match of anything inside it not cause a terminal rule to fail?

Isn't this really a bug with terminal rules?

Or, am I misunderstanding something?

-- Scott
Re: Help parsing 4.1+3., with a '.' at the end [message #1105674 is a reply to message #1105574] Tue, 10 September 2013 06:31 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander NittkaFriend
Messages: 1193
Registered: July 2009
Senior Member
Hi,

greedy here means that because "." was present, the rest is expected to follow as well. This is why you have to be extremely careful with defining terminal rules.

Alex


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: Help parsing 4.1+3., with a '.' at the end [message #1105764 is a reply to message #1105574] Tue, 10 September 2013 08:46 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7655
Registered: July 2009
Senior Member
Hi Scott

I completely agree with you, but you are not dealing with LALR parsing
here. It is LL and greedy so it carries on as fast as it can on the
first possible track until it succeeds or gets stuck. At this point you
have a choice between immediate failure or backtracking. Backtracking
works in the main Xtext grammar, but can be exponentially costly so it
is well worth avoiding. Although Xtext does a nice job, mostly, of
hiding the distinction between lexing and parsing, your problem is a
lexer problem and I have not had any success with lexer backtracking.

To solve exactly your problem in
http://git.eclipse.org/c/ocl/org.eclipse.ocl.git/tree/examples/org.eclipse.ocl.examples.xtext.essentialocl/src/org/eclipse/ocl/examples/xtext/essentialocl/services/RetokenizingTokenSource.java,
I intercept the token feed to Xtext to merge the INT DOT INT ID PLUS
INT token sequence (ID may only be 'e' or 'E') into rather more than a
simple INT. This also avoids problems whereby 'e' and 'E' fail to be
recognised by Xtext as identifiers without a smarter definition of ID.

terminal INT:
('0'..'9')+;

NUMBER_LITERAL returns BigNumber:
INT; // May actually be ('.' INT)? (('e' | 'E')
('+' | '-')? INT)?;

[RetokenizingTokenSource enabled me to defer my experiments on using the
LPG lexer as an Xtext lexer.]

Regards

Ed Willink

On 10/09/2013 03:58, Scott Hendrickson wrote:
> I suppose I have one last question. Taking the original terminal
> NUMBER rule of:
>
>
> terminal NUMBER:
> '-'? ('0'..'9')+ ('.' ('0'..'9')+)? (('e' | 'E') ('-' | '+')?
> ('0'..'9')+)?;
>
>
> Am I just misunderstanding how the '?' works in terminal definitions?
> Becuase I was assuming that the "('.' ('0'..'9')+)?" indicates that
> the whole part inside the parenthesis is optional for a match, just
> like the "'-'?" indicates that the dash is optional for a match.
>
> I understand the idea that terminal rules are "greedy" in that they
> match the most text that they can. But, it seems to me that,
> logically, the rule should only match the "4" part of "4." rather than
> failing because it requires a digit after the '.'. This seems
> incorrect in that it is ultimately forcing a match of an optional part
> -- which makes me wonder what the '?' really means in the first place.
>
> In other words, shouldn't a "(...)?" ultimately be optional, and thus,
> a partial match of anything inside it not cause a terminal rule to fail?
>
> Isn't this really a bug with terminal rules?
>
> Or, am I misunderstanding something?
>
> -- Scott
Re: Help parsing 4.1+3., with a '.' at the end [message #1105877 is a reply to message #1104595] Tue, 10 September 2013 11:42 Go to previous messageGo to next message
Scott Hendrickson is currently offline Scott HendricksonFriend
Messages: 22
Registered: December 2009
Junior Member
Thank you all for your responses and for being so patient with me. I think I understand it now.

I ended up modifying Henrik's solution, which seems to work for my situation:

terminal INT:
	('0'..'9')+;

terminal EXT_INT:
	INT ('e' | 'E') ('-' | '+')? INT;

NUMBER hidden():
	'-'? INT ('.' (EXT_INT | INT))?;


I'm going to look at Ed's solution too (thanks Ed), but mainly to learn a little more about some alternatives that could be tried.

Thank you again!
-- Scott
Re: Help parsing 4.1+3., with a '.' at the end [message #1106330 is a reply to message #1105877] Wed, 11 September 2013 01:07 Go to previous message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2013-10-09 13:42, Scott Hendrickson wrote:
> Thank you all for your responses and for being so patient with me. I
> think I understand it now.
>
> I ended up modifying Henrik's solution, which seems to work for my
> situation:
>
>
> terminal INT:
> ('0'..'9')+;
>
> terminal EXT_INT:
> INT ('e' | 'E') ('-' | '+')? INT;
>
> NUMBER hidden():
> '-'? INT ('.' (EXT_INT | INT))?;
>
>
> I'm going to look at Ed's solution too (thanks Ed), but mainly to learn
> a little more about some alternatives that could be tried.
>

To get what you actually want :-), you can also write an external lexer
- it gives you full control over lexing (replace tokens, use predicates,
call java logic etc). It is supported by Xtext (the process is
integrated into the workflow), but there are some hoops to jump through
to make it work. IMO there is much to gain from having an external lexer
when the language is not a vanilla DSL. If you can reduce the overall
amount of tokens as well as simplifying the decision the grammar has to
make the better (i.e better performance, better
"out of the box" behavior wrt code completion etc when grammar rules are
straight forward - hey, formatting may even be ok since it becomes
humanly possible to write the rules that find all corner cases simply
because there are fewer corners :-)

cloudsmith / geppetto @ github uses an external lexer.

Regards
- henrik
Previous Topic:Change one attribute's property from the genmodel programatically
Next Topic:Error 'no viable alternative at character '<EOF>'
Goto Forum:
  


Current Time: Sat Apr 20 02:13:15 GMT 2024

Powered by FUDForum. Page generated in 0.04421 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top