|Re: whitespaces not allowed in grammar [message #870799 is a reply to message #870778]
||Wed, 09 May 2012 18:03
| Henrik Lindberg
Registered: July 2009
On 2012-09-05 17:39, Tim Student wrote:|
> Thanks the approach is clear so far, but can you tell it to me some more
> My Problem is, that I cannot handle whitespace characters. For example I
> would like to realise something like a variable assignment, that is
> restricted to:
> //only this should be valid:
> //and not
> var= "anystring";
> //and not
> var ="anystring";
> //and not
> var = "anystring";
> When my grammar says:
> variable: name=ID '=' varcontent=STRING;
> ... all the three above mentioned entries will be valid, becaus Xtext
> allows an arbitrary number of whitespaces between the single elements.
> What can I do there?
What I meant, but did not say - I wrote hidden(WS) - it should be the
other way around - you specify what should be hidden i.e.
hidden(SL_COMMENT, ML_COMMENT, ...) if you want to allow comments, or
hidden() if nothing should be hidden - you then specify WS where it is
Rule hidden() :
variable: name=ID '=' varcontent=STRING;
|Re: whitespaces not allowed in grammar [message #875369 is a reply to message #870725]
||Tue, 22 May 2012 15:59
| Felix Feisst
Registered: February 2012
Tim Student wrote on Wed, 09 May 2012 15:21|
I'm trying to build a Xtext
system echo //OK
system -i echo //OK
system -i echo -n "Hello World!" //OK
system -i echo -i "Hello World!" //NOT OK -> "no viable alternative at input '-i'" //(every other character as an option except -i would be ok)
Can somebody tell me, what is wrong with my grammar?
The problem is your lexer tokens. Let me explain the problem to you.
Before the actual parser parses your input, the input is preprocessed by the so-called lexer. The lexer converts the input which is just a stream of characters to a so called token stream. Lets identify the tokens in your grammar:
These tokens follow directly from your grammar where ID and STRING are predefined tokens (also called terminals).
What the lexer now does, is to convert the stream of characters to a stream of tokens. Lets look at two of your examples:
system -i echo -n "Hello World!" ==> 'system' '-i' ID '-' ID STRING
system -i echo -i "Hello World!" ==> 'system' '-i' ID '-i' STRING
Note in the second example the "-i" is converted to '-i' instead of '-' ID. This is because the lexer prefers token '-i' over ID.
What your parser expects is the following stream of tokens:
'system' '-i'? ID ('-' ID STRING?)*
In the second example the parser cannot regognize the input because it sees the '-i' token but expects the '-' token.
I hope the problem is clear to you now.
There are several ways to fix this problem:
One solution would be to allow '-i' as an alternative to '-'ID:
system: 'system' '-i'? prog=ID (('-'ID | '-i') STRING?)* ;
This solution feels a little dirty for me.
Another solution which I would prefer is to allow any ID as flag after system and use the validator to ensure that only "-i" is used. I would also define an own token (terminal) for the flags:
system: 'system' FLAG? prog=ID (FLAG STRING?)*
terminal FLAG: '-' ('a'..'z'|'A'..'Z'|'0'..'9')+;
The own FLAG token also solves your problem with the spaces between the '-' and the flag characters.
Don't forget to implement the validation that ensures that the first FLAG just is "-i".
I hope that was helpful,
[Updated on: Tue, 22 May 2012 16:00]
Report message to a moderator
Powered by FUDForum
. Page generated in 0.02188 seconds