Eclipse Community Forums: TMF (Xtext) » Possible different handling of 'reserved' words

Help

Home

Home » Modeling » TMF (Xtext) » Possible different handling of 'reserved' words

Show: Today's Messages :: Show Polls :: Message Navigator

Possible different handling of 'reserved' words [message #642129]

Tue, 30 November 2010 00:56

Aleks Mising name

Messages: 8
Registered: November 2010

Junior Member

Hi all,

I'm new to the forum and fairly new to Xtext and have run into an issue for which I have no answer at the moment.
I am interested in how 'reserved' words in a grammar are handled.

It seems that from the parser's point of view every reserved word becomes a globally reserved word, regardless of the context.

Not sure this is making much sense, so I'll use an example (this is from the xtext example project - fowler state machine).

If in the editor I create an empty state machine, I get the following:

events
startEvent START1
stopEvent STOP2
end
commands
end
state start
startEvent => start
end
state stop
stopEvent => stop
end

This obviously works fine.

However, if in my events definition I try to give the startEvent the code id of state, so I have something like this;

events
startEvent state
stopEvent STOP2
end

the editor will flag an error because state is a reserved word, even though the word 'state' in that context (i.e. within 'events') shouldn't be a reserved word.

So in the end, I'm really wondering whether there is some Parser option that can change this behaviour or is this the required behaviour and there is something about the parse process I am not quite understanding correctly?

Report message to a moderator

Re: Possible different handling of 'reserved' words [message #642155 is a reply to message #642129]

Tue, 30 November 2010 07:07

Mirko Raner

Messages: 125
Registered: July 2009
Location: New York City, NY

Senior Member

I would say that it's generally the expected behavior that you cannot use reserved words as identifiers. For example, in Java, you cannot have an identifier called 'default', 'class', or 'synchronized', exactly because those are reserved words (even if their syntactic position would unambiguously determine them to be identifiers). For most applications, this is actually the desired behavior.

However, I also have come across situations where you explicitly want to allow keywords as identifiers. The basic problem here is that the lexer will always process a keyword as a keyword, and not as an identifier (at the lexer level it's really either/or; the same token cannot be sometimes interpreted as a keyword and sometimes as an identifier). So, during lexing, keywords will always come out as keywords.

The solution is to (1) extract all keywords as explicit terminal rules, and (2) introduce a new rule that can be either an ID or one of the keywords, for example: Name: ID|Keywords; (with Keywords being all possible keywords).

You could refactor Fowler's state machine example as follows:

Statemachine: {Statemachine}
  _EVENTS
     (events+=Event)*
  _END
  _COMMANDS
     (commands+=Command)*
  _END
  (states+=State)*;
 
Event :
  (resetting?=_RESETTING)? name=ID code=Name;
 
Command :
  name=ID code=Name;

State :
  _STATE name=ID
     (_ACTIONS '{' (actions+=[Command])+ '}')?
     (transitions+=Transition)*
  _END;
 
Transition :
  event=[Event] '=>' state=[State];

Name: ID|Keywords;
Keywords: _EVENTS|_COMMANDS|_ACTIONS|_STATE|_RESETTING|_END;

terminal _EVENTS: 'events';
terminal _COMMANDS: 'commands';
terminal _ACTIONS: 'actions';
terminal _STATE: 'state';
terminal _RESETTING: 'resetting';
terminal _END: 'end';

Only in the 'code' attributes, ID was replace with the new Name rule. You cannot generally replace all ID instances with Name, because the grammar would become ambiguous. For example, if the Event rule used name=Name the parser would not be able to determine whether to go into the (events+=Event)* branch or to go straight to the _END after that if it received an 'end' token,

Does this answer your question?

Report message to a moderator

Re: Possible different handling of 'reserved' words [message #642183 is a reply to message #642155]

Tue, 30 November 2010 09:58

Meinte Boersma

Messages: 434
Registered: July 2009
Location: Leiden, Netherlands

Senior Member

The simple solution is to "fix" your DSL text by prepending "keywords-which-are-not-keywords" with a '^'. It's somewhat ugly, obviously, but it does work and keeps your grammar the way it is.

The problem is that keywords are already recognized by the lexer, which is not aware of any context -that is the role of the parser. So, any keyword is always recognized as such. (And that's why the standard terminal rule ID has an optional '^' in front of it which is chopped off by the default corresponding value converter.)

Xtext blogs: executable models...again? | workshop material | custom scoping with Xtend

Report message to a moderator

Re: Possible different handling of 'reserved' words [message #642218 is a reply to message #642129]

Tue, 30 November 2010 12:37

Sebastian Zarnekow

Messages: 3118
Registered: July 2009

Senior Member

Hi Aleks,

you are always free to use a data type rule for your IDs and list all
keywords which may be IDs there:

ValidID:
ID | 'state' | 'start';

Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

Am 30.11.10 01:56, schrieb Aleks:
> Hi all,
>
> I'm new to the forum and fairly new to Xtext and have run into an issue
> for which I have no answer at the moment. I am interested in how
> 'reserved' words in a grammar are handled.
> It seems that from the parser's point of view every reserved word
> becomes a globally reserved word, regardless of the context.
> Not sure this is making much sense, so I'll use an example (this is from
> the xtext example project - fowler state machine).
>
> If in the editor I create an empty state machine, I get the following:
>
> events
> startEvent START1
> stopEvent STOP2
> end commands end state start
> startEvent => start
> end state stop stopEvent => stop
> end
>
> This obviously works fine.
>
> However, if in my events definition I try to give the startEvent the
> code id of state, so I have something like this;
>
>
> events
> startEvent state
> stopEvent STOP2
> end
>
>
> the editor will flag an error because state is a reserved word, even
> though the word 'state' in that context (i.e. within 'events') shouldn't
> be a reserved word.
>
> So in the end, I'm really wondering whether there is some Parser option
> that can change this behaviour or is this the required behaviour and
> there is something about the parse process I am not quite understanding
> correctly?

Report message to a moderator

Re: Possible different handling of 'reserved' words [message #642600 is a reply to message #642129]

Wed, 01 December 2010 22:45

Aleks Mising name

Messages: 8
Registered: November 2010

Junior Member

Thanks for your replies. Very helpful and much appreciated!

Cheers!

Report message to a moderator

Re: Possible different handling of 'reserved' words [message #643632 is a reply to message #642600]

Tue, 07 December 2010 17:04

Jonathan

Messages: 6
Registered: December 2010

Junior Member

Hi everybody,

I have read your answers with interest.
I have been working on an EcmaScript grammar for two weeks. It was difficult to distinguish between a division expression and a regular expression when character '/' (slash) was scanned by the lexer.
As there are too many keywords in the language to create a terminal rule for each one, I decided to improve the lexer manually by turning off regexp recognition when there cannot be any regexp (e.g. after an identifier). This works fine until now.

Report message to a moderator

Previous Topic:	Grammar Question and using unordered groups
Next Topic:	Xtext Builder delta

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Wed Apr 24 14:01:15 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter