Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Unexpected error when parsing input
Unexpected error when parsing input [message #991430] Wed, 19 December 2012 04:23 Go to next message
Scott Hendrickson is currently offline Scott Hendrickson
Messages: 21
Registered: December 2009
Junior Member
I have the following grammar:

grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"

TopExpression returns Expression:
	exps+=BottomExpression (ops+=Operations exps+=BottomExpression)*;

Operations:
	'*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem' ;

BottomExpression returns Expression:
	(value=VARIABLE) | '(' exps+=TopExpression ')';

terminal DIGIT:
	'0'..'9';

terminal LOWER_CASE_LETTER:
	'a'..'z';

terminal UPPER_CASE_LETTER:
	'A'..'Z';

terminal WHITESPACE:
	(' ' | '\t' | '\r' | '\n')+;

VARIABLE:
	LOWER_CASE_LETTER (DIGIT | LOWER_CASE_LETTER | UPPER_CASE_LETTER | '_')*;


However, it cannot parse the input "compatible_directions mod variable2". The parse error that I get is "mismatched character 'c' expecting 'm' at offset: 13". As far as I can tell it's trying to parse "rem" from the "re" in the middle of "directions".

In the grammar above, I can make VARIABLE terminal and it works. But, in the real grammar, I cannot make the VARIABLE rule terminal.

Any idea why xtext is trying to parse "rem" rather than "mod" in the input? Is there a way to fix that?

Any help is greatly appreciated.

Thank you,
-- Scott
Re: Unexpected error when parsing input [message #991434 is a reply to message #991430] Wed, 19 December 2012 06:43 Go to previous messageGo to next message
Alexander Nittka is currently offline Alexander Nittka
Messages: 1156
Registered: July 2009
Senior Member
Hi,

the lexer is greedy and tries to make tokens as long as possible. This is why "re" in directions are not tokenized as two individual characters as you expect. The lexer knows a token (rem) starting with "re" which is longer than the single characters and it does not backtrack. Hence the error.

Why can't you have a terminal that accepts something like ID from the default grammar. You could still make Variable a datatype rule with a value converter that enforces the correct format. Not every language detail has to be dealt with in the grammar. Often it is better to make the grammar more forgiving and have validation for user friendly error messages.

Alex

P.S.: Your digit, lower case and upper case character terminal rules look more like terminal fragments.


Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
Re: Unexpected error when parsing input [message #991499 is a reply to message #991430] Wed, 19 December 2012 14:27 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik Lindberg
Messages: 2500
Registered: July 2009
Senior Member
It is not a good idea to have tokens that are individual characters when
there are going to be many of them; your LOWER_CASE_LETTER and
UPPER_CASE_LETTER will cause serious bloat to the resulting parse tree
(each node will have quite a lot of extra information).

Recommend using a longer token; like the ID in the standard terminals.
Also do the same for DIGIT.

Small note, if you use WS as the name for Whitespace I think you will
need to do less customization (you need to tell the framework what your
whitespace rule is otherwise IIRC).

- henrik

On 2012-19-12 5:23, Scott Hendrickson wrote:
> I have the following grammar:
>
>
> grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE)
>
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"
>
> TopExpression returns Expression:
> exps+=BottomExpression (ops+=Operations exps+=BottomExpression)*;
>
> Operations:
> '*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem' ;
>
> BottomExpression returns Expression:
> (value=VARIABLE) | '(' exps+=TopExpression ')';
>
> terminal DIGIT:
> '0'..'9';
>
> terminal LOWER_CASE_LETTER:
> 'a'..'z';
>
> terminal UPPER_CASE_LETTER:
> 'A'..'Z';
>
> terminal WHITESPACE:
> (' ' | '\t' | '\r' | '\n')+;
>
> VARIABLE:
> LOWER_CASE_LETTER (DIGIT | LOWER_CASE_LETTER | UPPER_CASE_LETTER |
> '_')*;
>
> However, it cannot parse the input "compatible_directions mod
> variable2". The parse error that I get is "mismatched character 'c'
> expecting 'm' at offset: 13". As far as I can tell it's trying to parse
> "rem" from the "re" in the middle of "directions".
>
> In the grammar above, I can make VARIABLE terminal and it works. But, in
> the real grammar, I cannot make the VARIABLE rule terminal.
>
> Any idea why xtext is trying to parse "rem" rather than "mod" in the
> input? Is there a way to fix that?
>
> Any help is greatly appreciated.
>
> Thank you,
> -- Scott
Re: Unexpected error when parsing input [message #991979 is a reply to message #991499] Thu, 20 December 2012 15:34 Go to previous messageGo to next message
Scott Hendrickson is currently offline Scott Hendrickson
Messages: 21
Registered: December 2009
Junior Member
Thank you Alex and Henrik! I incorporated your suggestions and it works now. I guess I need to use more terminal rules.
Re: Unexpected error when parsing input [message #992111 is a reply to message #991979] Thu, 20 December 2012 22:29 Go to previous message
Henrik Lindberg is currently offline Henrik Lindberg
Messages: 2500
Registered: July 2009
Senior Member
On 2012-20-12 16:34, Scott Hendrickson wrote:
> Thank you Alex and Henrik! I incorporated your suggestions and it works
> now. I guess I need to use more terminal rules.

Not sure what you mean, but in general I would say that you want as few
terminal rules as possible, and that they return tokens that are as long
as possible :)

The more terminal rules you add the greater the risk that they overlap.

The set provided by the standard terminals is a pretty good start.

If you think you need a new terminal, you are probably better of with a
Data rule.

OTOH - if you find that you need to write really complex Data rules (and
many of them), then you may be better of with a custom lexer.

Hope that helps.

Regards
- henrik
Previous Topic:JvmTypeReference and XExpression error
Next Topic:Maven Xtend compiler - file encoding
Goto Forum:
  


Current Time: Wed Sep 17 19:37:10 GMT 2014

Powered by FUDForum. Page generated in 0.04468 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software