|
|
|
|
|
|
Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759346 is a reply to message #759337] |
Mon, 28 November 2011 12:59 |
Meinte Boersma Messages: 434 Registered: July 2009 Location: Leiden, Netherlands |
Senior Member |
|
|
Well, that "rule" might be nothing more than a chain of consequences resulting from how an Xtext grammar is mapped to an ANTLR grammar and how ANTLR maps that to a generated parser - heck, it might even be a bug. Understanding said rule will probably not bring you much, or rather, any closer to a grammar that's able to parse the class of documents you're aiming for. (In fact, given the current grammar, I'd say that using a regexp is probably much easier.)
The right direction would be understanding how to really write grammars in Xtext (not trying to map a grammar from some other parsing tech as verbatim as possible to Xtext). The tutorials and examples go a long way in helping. The first thing to do would probably be to understand what feature assignments are and why they are necessary (and different from unassigned rule calls).
Xtext blogs: executable models...again? | workshop material | custom scoping with Xtend
[Updated on: Mon, 28 November 2011 13:00] Report message to a moderator
|
|
|
|
|
|
Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759644 is a reply to message #759308] |
Tue, 29 November 2011 13:03 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
In general - the terminals are cookie cutters that hack up the input
into tokens. They are tried from most significant to lowest (in the
order they are stated) i.e. if your first terminal is:
terminal ANY : . ;
no terminal after it will ever be triggered.
If you want to create reusable terminal "parts" you should look at
'terminal fragments' - they are now supported in Xtext. All reusable
parts are declared with "terminal fragment". Here is an example:
-------- (From Cloudsmith/Geppetto @ github), pp.xtext -------
terminal REGULAR_EXPRESSION
// Special rules in the lexer must prevent the RE from being recognized
// except after ',' 'node', '{','}, '=~', '!~'
: '/' RE_BODY '/' RE_FLAGS?
;
terminal fragment RE_BODY
: RE_FIRST_CHAR
RE_FOLLOW_CHAR*
;
terminal fragment RE_FIRST_CHAR
// regexp can not start with:
// - a '*' (illegal regexp, and makes it look like a MLCOMMENT start
// - a '/' since that makes it empty (which is an invalid regexp)
// - a NL since all of the regexp must be on one line
: (!('\n' | '*' | '/' | '\\') | RE_BACKSLASH_SEQUENCE)
;
terminal fragment RE_FOLLOW_CHAR
// subsequent regexp chars include '*'
: (RE_FIRST_CHAR | '*')
;
terminal fragment RE_BACKSLASH_SEQUENCE:
// Any character can be escaped except NL since all of the regexp must
// be on one line.
('\\' !'\n')
;
terminal fragment RE_FLAGS:
// RUBY REGEX flags: i o x m u e s n (optional, or in any order, but
// only use each once
// Puppet does not support these (currently), they are recognized to
// enable warning that
// they are not supported (no other meaning can be applied to letter
// appearing after
// the end '/' in a regexp. Check for supported flags can be done in
// validation if they
// become available.
('a'..'z')+
;
---------
Note that the terminal fragments do not become tokens themselves - e.g.
the grammar will never see RE_FLAGS, RE_BACKSLASH_SEQUENCE etc. The
grammar only gets the true terminal REGULAR_EXPRESSION. (Note, if you
try to actually use the example that it does not show everything to
handle regular expressions - snippet only illustrates how fragments are
used).
I have not looked at your grammar/terminals in great detail, but you
probably have overlapping / ambiguous terminals. When you change the
terminal rule to be non-terminal (i.e. a datatype rule), you moved the
recognition of the input from the lexer to the parser.
Also note that there is a difference between terminals and keywords. In
simple terms - a token is matched by the lexer, if it matches a keyword
this token is delivered instead.
As an example - the keyword 'if' is specified in the grammar (as a
keyword) and there is an ID terminal that matches identifiers. If the
input is "ifif" you get an ID token, and if input is "if", you get the
token IF. Contrast this with having specified the keyword 'if' as a
terminal. You would now have to specify it with higher precedence than
the ID terminal (or you would never see it, as ID would eat all matching
characters). Since it now has higher precedence than the ID rule, the
input "ifif" will be lexed as the two tokens IF IF.
I hope that explains the relationship between terminals, terminal
fragments and keywords.
Regards
- henrik
On 2011-28-11 11:45, Robin wrote:
> Hi,
>
> the parser works the same as far as I can tell. How would you quantify
> "better"?
>
> But I am still trying to figure out the original questions: Is there a
> rule regarding sub-structuring terminals?? And why do I get this
> EOF-error??
>
> Is there anyone here, who can maybe explain that? Or push me in the
> right direction? :)
> Thanks, Robin
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.05029 seconds