|
|
|
|
|
|
|
|
|
|
Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759644 is a reply to message #759308] |
Tue, 29 November 2011 08:03   |
Eclipse User |
|
|
|
In general - the terminals are cookie cutters that hack up the input
into tokens. They are tried from most significant to lowest (in the
order they are stated) i.e. if your first terminal is:
terminal ANY : . ;
no terminal after it will ever be triggered.
If you want to create reusable terminal "parts" you should look at
'terminal fragments' - they are now supported in Xtext. All reusable
parts are declared with "terminal fragment". Here is an example:
-------- (From Cloudsmith/Geppetto @ github), pp.xtext -------
terminal REGULAR_EXPRESSION
// Special rules in the lexer must prevent the RE from being recognized
// except after ',' 'node', '{','}, '=~', '!~'
: '/' RE_BODY '/' RE_FLAGS?
;
terminal fragment RE_BODY
: RE_FIRST_CHAR
RE_FOLLOW_CHAR*
;
terminal fragment RE_FIRST_CHAR
// regexp can not start with:
// - a '*' (illegal regexp, and makes it look like a MLCOMMENT start
// - a '/' since that makes it empty (which is an invalid regexp)
// - a NL since all of the regexp must be on one line
: (!('\n' | '*' | '/' | '\\') | RE_BACKSLASH_SEQUENCE)
;
terminal fragment RE_FOLLOW_CHAR
// subsequent regexp chars include '*'
: (RE_FIRST_CHAR | '*')
;
terminal fragment RE_BACKSLASH_SEQUENCE:
// Any character can be escaped except NL since all of the regexp must
// be on one line.
('\\' !'\n')
;
terminal fragment RE_FLAGS:
// RUBY REGEX flags: i o x m u e s n (optional, or in any order, but
// only use each once
// Puppet does not support these (currently), they are recognized to
// enable warning that
// they are not supported (no other meaning can be applied to letter
// appearing after
// the end '/' in a regexp. Check for supported flags can be done in
// validation if they
// become available.
('a'..'z')+
;
---------
Note that the terminal fragments do not become tokens themselves - e.g.
the grammar will never see RE_FLAGS, RE_BACKSLASH_SEQUENCE etc. The
grammar only gets the true terminal REGULAR_EXPRESSION. (Note, if you
try to actually use the example that it does not show everything to
handle regular expressions - snippet only illustrates how fragments are
used).
I have not looked at your grammar/terminals in great detail, but you
probably have overlapping / ambiguous terminals. When you change the
terminal rule to be non-terminal (i.e. a datatype rule), you moved the
recognition of the input from the lexer to the parser.
Also note that there is a difference between terminals and keywords. In
simple terms - a token is matched by the lexer, if it matches a keyword
this token is delivered instead.
As an example - the keyword 'if' is specified in the grammar (as a
keyword) and there is an ID terminal that matches identifiers. If the
input is "ifif" you get an ID token, and if input is "if", you get the
token IF. Contrast this with having specified the keyword 'if' as a
terminal. You would now have to specify it with higher precedence than
the ID terminal (or you would never see it, as ID would eat all matching
characters). Since it now has higher precedence than the ID rule, the
input "ifif" will be lexed as the two tokens IF IF.
I hope that explains the relationship between terminals, terminal
fragments and keywords.
Regards
- henrik
On 2011-28-11 11:45, Robin wrote:
> Hi,
>
> the parser works the same as far as I can tell. How would you quantify
> "better"?
>
> But I am still trying to figure out the original questions: Is there a
> rule regarding sub-structuring terminals?? And why do I get this
> EOF-error??
>
> Is there anyone here, who can maybe explain that? Or push me in the
> right direction? :)
> Thanks, Robin
|
|
|
|
Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #760214 is a reply to message #760111] |
Thu, 01 December 2011 10:51  |
Eclipse User |
|
|
|
On 2011-01-12 12:00, Robin wrote:
> Weird thing is that I don't need the WS when I am not substructuring
> into terminals. I guess that is again the keyword - terminal distinction.
>
WS is hidden by default (since you are using the default terminals
grammar) - so when using a data type rule, WS and comments may appear
between any tokens. If you want to forbid that, you must use 'hidden()'
and then specify where any whitespace may appear (if any).
Regards
- henrik
|
|
|
Powered by
FUDForum. Page generated in 0.29585 seconds