Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Possible different handling of 'reserved' words
Possible different handling of 'reserved' words [message #642129] Tue, 30 November 2010 00:56 Go to next message
Aleks Mising name is currently offline Aleks Mising nameFriend
Messages: 8
Registered: November 2010
Junior Member
Hi all,

I'm new to the forum and fairly new to Xtext and have run into an issue for which I have no answer at the moment.
I am interested in how 'reserved' words in a grammar are handled.

It seems that from the parser's point of view every reserved word becomes a globally reserved word, regardless of the context.

Not sure this is making much sense, so I'll use an example (this is from the xtext example project - fowler state machine).

If in the editor I create an empty state machine, I get the following:

events
startEvent START1
stopEvent STOP2
end
commands
end
state start
startEvent => start
end
state stop
stopEvent => stop
end


This obviously works fine.

However, if in my events definition I try to give the startEvent the code id of state, so I have something like this;


events
startEvent state
stopEvent STOP2
end



the editor will flag an error because state is a reserved word, even though the word 'state' in that context (i.e. within 'events') shouldn't be a reserved word.

So in the end, I'm really wondering whether there is some Parser option that can change this behaviour or is this the required behaviour and there is something about the parse process I am not quite understanding correctly?
Re: Possible different handling of 'reserved' words [message #642155 is a reply to message #642129] Tue, 30 November 2010 07:07 Go to previous messageGo to next message
Mirko Raner is currently offline Mirko RanerFriend
Messages: 125
Registered: July 2009
Location: New York City, NY
Senior Member
I would say that it's generally the expected behavior that you cannot use reserved words as identifiers. For example, in Java, you cannot have an identifier called 'default', 'class', or 'synchronized', exactly because those are reserved words (even if their syntactic position would unambiguously determine them to be identifiers). For most applications, this is actually the desired behavior.

However, I also have come across situations where you explicitly want to allow keywords as identifiers. The basic problem here is that the lexer will always process a keyword as a keyword, and not as an identifier (at the lexer level it's really either/or; the same token cannot be sometimes interpreted as a keyword and sometimes as an identifier). So, during lexing, keywords will always come out as keywords.

The solution is to (1) extract all keywords as explicit terminal rules, and (2) introduce a new rule that can be either an ID or one of the keywords, for example: Name: ID|Keywords; (with Keywords being all possible keywords).

You could refactor Fowler's state machine example as follows:

Statemachine: {Statemachine}
  _EVENTS
     (events+=Event)*
  _END
  _COMMANDS
     (commands+=Command)*
  _END
  (states+=State)*;
 
Event :
  (resetting?=_RESETTING)? name=ID code=Name;
 
Command :
  name=ID code=Name;

State :
  _STATE name=ID
     (_ACTIONS '{' (actions+=[Command])+ '}')?
     (transitions+=Transition)*
  _END;
 
Transition :
  event=[Event] '=>' state=[State];

Name: ID|Keywords;
Keywords: _EVENTS|_COMMANDS|_ACTIONS|_STATE|_RESETTING|_END;

terminal _EVENTS: 'events';
terminal _COMMANDS: 'commands';
terminal _ACTIONS: 'actions';
terminal _STATE: 'state';
terminal _RESETTING: 'resetting';
terminal _END: 'end';


Only in the 'code' attributes, ID was replace with the new Name rule. You cannot generally replace all ID instances with Name, because the grammar would become ambiguous. For example, if the Event rule used name=Name the parser would not be able to determine whether to go into the (events+=Event)* branch or to go straight to the _END after that if it received an 'end' token,

Does this answer your question?

Re: Possible different handling of 'reserved' words [message #642183 is a reply to message #642155] Tue, 30 November 2010 09:58 Go to previous messageGo to next message
Meinte Boersma is currently offline Meinte BoersmaFriend
Messages: 434
Registered: July 2009
Location: Leiden, Netherlands
Senior Member
The simple solution is to "fix" your DSL text by prepending "keywords-which-are-not-keywords" with a '^'. It's somewhat ugly, obviously, but it does work and keeps your grammar the way it is.

The problem is that keywords are already recognized by the lexer, which is not aware of any context -that is the role of the parser. So, any keyword is always recognized as such. (And that's why the standard terminal rule ID has an optional '^' in front of it which is chopped off by the default corresponding value converter.)


Re: Possible different handling of 'reserved' words [message #642218 is a reply to message #642129] Tue, 30 November 2010 12:37 Go to previous messageGo to next message
Sebastian Zarnekow is currently offline Sebastian ZarnekowFriend
Messages: 3118
Registered: July 2009
Senior Member
Hi Aleks,

you are always free to use a data type rule for your IDs and list all
keywords which may be IDs there:

ValidID:
ID | 'state' | 'start';

Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 30.11.10 01:56, schrieb Aleks:
> Hi all,
>
> I'm new to the forum and fairly new to Xtext and have run into an issue
> for which I have no answer at the moment. I am interested in how
> 'reserved' words in a grammar are handled.
> It seems that from the parser's point of view every reserved word
> becomes a globally reserved word, regardless of the context.
> Not sure this is making much sense, so I'll use an example (this is from
> the xtext example project - fowler state machine).
>
> If in the editor I create an empty state machine, I get the following:
>
> events
> startEvent START1
> stopEvent STOP2
> end commands end state start
> startEvent => start
> end state stop stopEvent => stop
> end
>
> This obviously works fine.
>
> However, if in my events definition I try to give the startEvent the
> code id of state, so I have something like this;
>
>
> events
> startEvent state
> stopEvent STOP2
> end
>
>
> the editor will flag an error because state is a reserved word, even
> though the word 'state' in that context (i.e. within 'events') shouldn't
> be a reserved word.
>
> So in the end, I'm really wondering whether there is some Parser option
> that can change this behaviour or is this the required behaviour and
> there is something about the parse process I am not quite understanding
> correctly?
Re: Possible different handling of 'reserved' words [message #642600 is a reply to message #642129] Wed, 01 December 2010 22:45 Go to previous messageGo to next message
Aleks Mising name is currently offline Aleks Mising nameFriend
Messages: 8
Registered: November 2010
Junior Member
Thanks for your replies. Very helpful and much appreciated!


Cheers!
Re: Possible different handling of 'reserved' words [message #643632 is a reply to message #642600] Tue, 07 December 2010 17:04 Go to previous message
Jonathan is currently offline JonathanFriend
Messages: 6
Registered: December 2010
Junior Member
Hi everybody,

I have read your answers with interest.
I have been working on an EcmaScript grammar for two weeks. It was difficult to distinguish between a division expression and a regular expression when character '/' (slash) was scanned by the lexer.
As there are too many keywords in the language to create a terminal rule for each one, I decided to improve the lexer manually by turning off regexp recognition when there cannot be any regexp (e.g. after an identifier). This works fine until now.
Previous Topic:Grammar Question and using unordered groups
Next Topic:Xtext Builder delta
Goto Forum:
  


Current Time: Wed Apr 24 14:01:15 GMT 2024

Powered by FUDForum. Page generated in 0.03087 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top