Eclipse Community Forums: TMF (Xtext) » Unexpected error when parsing input

Help

Home

Home » Modeling » TMF (Xtext) » Unexpected error when parsing input

Show: Today's Messages :: Show Polls :: Message Navigator

Unexpected error when parsing input [message #991430]

Wed, 19 December 2012 04:23

Scott Hendrickson

Messages: 22
Registered: December 2009

Junior Member

I have the following grammar:

grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE)

import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"

TopExpression returns Expression:
	exps+=BottomExpression (ops+=Operations exps+=BottomExpression)*;

Operations:
	'*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem' ;

BottomExpression returns Expression:
	(value=VARIABLE) | '(' exps+=TopExpression ')';

terminal DIGIT:
	'0'..'9';

terminal LOWER_CASE_LETTER:
	'a'..'z';

terminal UPPER_CASE_LETTER:
	'A'..'Z';

terminal WHITESPACE:
	(' ' | '\t' | '\r' | '\n')+;

VARIABLE:
	LOWER_CASE_LETTER (DIGIT | LOWER_CASE_LETTER | UPPER_CASE_LETTER | '_')*;

However, it cannot parse the input "compatible_directions mod variable2". The parse error that I get is "mismatched character 'c' expecting 'm' at offset: 13". As far as I can tell it's trying to parse "rem" from the "re" in the middle of "directions".

In the grammar above, I can make VARIABLE terminal and it works. But, in the real grammar, I cannot make the VARIABLE rule terminal.

Any idea why xtext is trying to parse "rem" rather than "mod" in the input? Is there a way to fix that?

Any help is greatly appreciated.

Thank you,
-- Scott

Report message to a moderator

Re: Unexpected error when parsing input [message #991434 is a reply to message #991430]

Wed, 19 December 2012 06:43

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

the lexer is greedy and tries to make tokens as long as possible. This is why "re" in directions are not tokenized as two individual characters as you expect. The lexer knows a token (rem) starting with "re" which is longer than the single characters and it does not backtrack. Hence the error.

Why can't you have a terminal that accepts something like ID from the default grammar. You could still make Variable a datatype rule with a value converter that enforces the correct format. Not every language detail has to be dealt with in the grammar. Often it is better to make the grammar more forgiving and have validation for user friendly error messages.

Alex

P.S.: Your digit, lower case and upper case character terminal rules look more like terminal fragments.

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Unexpected error when parsing input [message #991499 is a reply to message #991430]

Wed, 19 December 2012 14:27

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

It is not a good idea to have tokens that are individual characters when
there are going to be many of them; your LOWER_CASE_LETTER and
UPPER_CASE_LETTER will cause serious bloat to the resulting parse tree
(each node will have quite a lot of extra information).

Recommend using a longer token; like the ID in the standard terminals.
Also do the same for DIGIT.

Small note, if you use WS as the name for Whitespace I think you will
need to do less customization (you need to tell the framework what your
whitespace rule is otherwise IIRC).

- henrik

On 2012-19-12 5:23, Scott Hendrickson wrote:
> I have the following grammar:
>
>
> grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE)
>
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"
>
> TopExpression returns Expression:
> exps+=BottomExpression (ops+=Operations exps+=BottomExpression)*;
>
> Operations:
> '*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem' ;
>
> BottomExpression returns Expression:
> (value=VARIABLE) | '(' exps+=TopExpression ')';
>
> terminal DIGIT:
> '0'..'9';
>
> terminal LOWER_CASE_LETTER:
> 'a'..'z';
>
> terminal UPPER_CASE_LETTER:
> 'A'..'Z';
>
> terminal WHITESPACE:
> (' ' | '\t' | '\r' | '\n')+;
>
> VARIABLE:
> LOWER_CASE_LETTER (DIGIT | LOWER_CASE_LETTER | UPPER_CASE_LETTER |
> '_')*;
>
> However, it cannot parse the input "compatible_directions mod
> variable2". The parse error that I get is "mismatched character 'c'
> expecting 'm' at offset: 13". As far as I can tell it's trying to parse
> "rem" from the "re" in the middle of "directions".
>
> In the grammar above, I can make VARIABLE terminal and it works. But, in
> the real grammar, I cannot make the VARIABLE rule terminal.
>
> Any idea why xtext is trying to parse "rem" rather than "mod" in the
> input? Is there a way to fix that?
>
> Any help is greatly appreciated.
>
> Thank you,
> -- Scott

Report message to a moderator

Re: Unexpected error when parsing input [message #991979 is a reply to message #991499]

Thu, 20 December 2012 15:34

Scott Hendrickson

Messages: 22
Registered: December 2009

Junior Member

Thank you Alex and Henrik! I incorporated your suggestions and it works now. I guess I need to use more terminal rules.

Report message to a moderator

Re: Unexpected error when parsing input [message #992111 is a reply to message #991979]

Thu, 20 December 2012 22:29

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 2012-20-12 16:34, Scott Hendrickson wrote:
> Thank you Alex and Henrik! I incorporated your suggestions and it works
> now. I guess I need to use more terminal rules.

Not sure what you mean, but in general I would say that you want as few
terminal rules as possible, and that they return tokens that are as long
as possible :)

The more terminal rules you add the greater the risk that they overlap.

The set provided by the standard terminals is a pretty good start.

If you think you need a new terminal, you are probably better of with a
Data rule.

OTOH - if you find that you need to write really complex Data rules (and
many of them), then you may be better of with a custom lexer.

Hope that helps.

Regards
- henrik

Report message to a moderator

Previous Topic:	JvmTypeReference and XExpression error
Next Topic:	Maven Xtend compiler - file encoding

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Tue Apr 23 13:05:59 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter