| Unexpected error when parsing input [message #991430] |
Tue, 18 December 2012 23:23  |
Scott Hendrickson Messages: 15 Registered: December 2009 |
Junior Member |
|
|
I have the following grammar:
grammar org.archstudio.prolog.xtext.Prolog hidden(WHITESPACE)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate prolog "http://www.archstudio.org/prolog/xtext/Prolog"
TopExpression returns Expression:
exps+=BottomExpression (ops+=Operations exps+=BottomExpression)*;
Operations:
'*' | '/' | '//' | 'rdiv' | '<<' | '>>' | 'mod' | 'rem' ;
BottomExpression returns Expression:
(value=VARIABLE) | '(' exps+=TopExpression ')';
terminal DIGIT:
'0'..'9';
terminal LOWER_CASE_LETTER:
'a'..'z';
terminal UPPER_CASE_LETTER:
'A'..'Z';
terminal WHITESPACE:
(' ' | '\t' | '\r' | '\n')+;
VARIABLE:
LOWER_CASE_LETTER (DIGIT | LOWER_CASE_LETTER | UPPER_CASE_LETTER | '_')*;
However, it cannot parse the input "compatible_directions mod variable2". The parse error that I get is "mismatched character 'c' expecting 'm' at offset: 13". As far as I can tell it's trying to parse "rem" from the "re" in the middle of "directions".
In the grammar above, I can make VARIABLE terminal and it works. But, in the real grammar, I cannot make the VARIABLE rule terminal.
Any idea why xtext is trying to parse "rem" rather than "mod" in the input? Is there a way to fix that?
Any help is greatly appreciated.
Thank you,
-- Scott
|
|
|
| Re: Unexpected error when parsing input [message #991434 is a reply to message #991430] |
Wed, 19 December 2012 01:43   |
Alexander Nittka Messages: 1075 Registered: July 2009 |
Senior Member |
|
|
Hi,
the lexer is greedy and tries to make tokens as long as possible. This is why "re" in directions are not tokenized as two individual characters as you expect. The lexer knows a token (rem) starting with "re" which is longer than the single characters and it does not backtrack. Hence the error.
Why can't you have a terminal that accepts something like ID from the default grammar. You could still make Variable a datatype rule with a value converter that enforces the correct format. Not every language detail has to be dealt with in the grammar. Often it is better to make the grammar more forgiving and have validation for user friendly error messages.
Alex
P.S.: Your digit, lower case and upper case character terminal rules look more like terminal fragments.
Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de
|
|
|
|
|
| Re: Unexpected error when parsing input [message #992111 is a reply to message #991979] |
Thu, 20 December 2012 17:29  |
Henrik Lindberg Messages: 2428 Registered: July 2009 |
Senior Member |
|
|
On 2012-20-12 16:34, Scott Hendrickson wrote:
> Thank you Alex and Henrik! I incorporated your suggestions and it works
> now. I guess I need to use more terminal rules.
Not sure what you mean, but in general I would say that you want as few
terminal rules as possible, and that they return tokens that are as long
as possible :)
The more terminal rules you add the greater the risk that they overlap.
The set provided by the standard terminals is a pretty good start.
If you think you need a new terminal, you are probably better of with a
Data rule.
OTOH - if you find that you need to write really complex Data rules (and
many of them), then you may be better of with a custom lexer.
Hope that helps.
Regards
- henrik
|
|
|
Powered by
FUDForum. Page generated in 0.01694 seconds