Eclipse Community Forums: TMF (Xtext) » Why can't I use terminals like this? ('missing EOF at'

Home

Home » Modeling » TMF (Xtext) » Why can't I use terminals like this? ('missing EOF at' - error)

Why can't I use terminals like this? ('missing EOF at' - error) [message #758731]

Thu, 24 November 2011 07:25

Eclipse User

Hi there!

I have a question regarding the use of terminals.

I use the grammar:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl" 

Model:
	(entries+=Entry)*;
	
Entry :
	INFO_TOKEN_START
	info=Info 
	TOKEN_END;
	
Info:
	name=ID;

terminal TOKEN_START:
	'<TOKEN';
	
terminal TOKEN_START_CLOSE:
	'>';
	
terminal INFO_TOKEN_START:
	TOKEN_START MARKER Y_COORDINATE NON_ITALIC TOKEN_START_CLOSE;

terminal TOKEN_END:
	'</TOKEN>';
	
terminal MARKER:
	('startX=114.24' | 'startX=311.24');

terminal Y_COORDINATE:
	'startY=' INT '.' INT;
	
terminal NON_ITALIC:
	'italicAngle=0.0';

I get an "missing EOF at '<TOKEN'" error, when I run this grammar in a new Eclipse instance with the language instance:

<TOKEN startX=311.24 startY=431.84 italicAngle=0.0>asd</TOKEN>

But when I put the INFO_TOKEN_START back in the Entry, it parses without problems:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl"

Model:
	(entries+=Entry)*;
	
Entry :
	TOKEN_START MARKER Y_COORDINATE NON_ITALIC TOKEN_START_CLOSE
	info=Info 
	TOKEN_END
	;
	
Info:
	name=ID
;

terminal TOKEN_START:
	'<TOKEN';
	
terminal TOKEN_START_CLOSE:
	'>';
	
//terminal INFO_TOKEN_START:
//	TOKEN_START MARKER Y_COORDINATE NON_ITALIC TOKEN_START_CLOSE
//;

terminal TOKEN_END:
	'</TOKEN>';
	
terminal MARKER:
	('startX=114.24' | 'startX=311.24');

terminal Y_COORDINATE:
	'startY=' INT '.' INT;
	
terminal NON_ITALIC:
	'italicAngle=0.0';

Hmmmm, why?? Confused

If you make the INFO_TOKEN_START not a terminal it also works:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl"

Model:
	(entries+=Entry)*;
	
Entry :
	INFO_TOKEN_START
	info=Info 
	TOKEN_END
	;
	
Info:
	name=ID
;

terminal TOKEN_START:
	'<TOKEN';
	
terminal TOKEN_START_CLOSE:
	'>';
	
INFO_TOKEN_START:
	TOKEN_START MARKER Y_COORDINATE NON_ITALIC TOKEN_START_CLOSE
;

terminal TOKEN_END:
	'</TOKEN>';
	
terminal MARKER:
	('startX=114.24' | 'startX=311.24');

terminal Y_COORDINATE:
	'startY=' INT '.' INT;
	
terminal NON_ITALIC:
	'italicAngle=0.0';

So it seems I have not the right understanding of terminals. Is there a rule regarding sub-structuring terminals?? And why do I get this EOF-error?? Could someone explain that to me, please? Smile

Thanks!

Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #758735 is a reply to message #758731]

Thu, 24 November 2011 07:35

Eclipse User

You have a "non-Xtext-y" way of structuring your grammar - more like an ANTLR/traditional parsing way. In general, Xtext pretty much figures out terminals for keywords itself. In general, you'd only need some terminal rules which are of the regexp type. By inlining the "constant" terminal rules, your parser will probably already perform better.

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #758781 is a reply to message #758735]

Thu, 24 November 2011 09:27

Eclipse User

Thanks for your answer! Hmmmm...

This means I have this grammar:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl"

Model:
	(entries+=Entry)*;
	
Entry :
	'<TOKEN' ('startX=114.24' | 'startX=311.24') 'startY=' INT '.' INT 'italicAngle=0.0' '>'
	info=Info 
	'</TOKEN>'
	
Info:
	name=ID;

First of all, this is kinda harder to read. Secondly, if you have similar structures, you want to reuse that without copy-pasting (or am I having a limited object-oriented view on this???).

A simple example would be this:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl"

Model:
	(entries+=Entry)*;
	
Entry :
	'<TOKEN' ('startX=114.24' | 'startX=311.24') 'startY=' INT '.' INT 'italicAngle=0.0' '>'
	info=Info 
	'</TOKEN>'
	
	'<TOKEN' ('startX=114.24' | 'startX=311.24') 'startY=' INT '.' INT 'italicAngle=1.0' '>'
	info2=Info 
	'</TOKEN>'
	;
	
Info:
	name=ID;

So, why is it better to incline the terminals??? I don't see why I cannot put reoccurring strings in terminals to reuse them and keep the grammar more readable. Please someone explain!!

Thanks!

Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #758782 is a reply to message #758781]

Thu, 24 November 2011 09:37

Eclipse User

Apart from readability/composition/etc. issues: does the parser work better now?

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759308 is a reply to message #758782]

Mon, 28 November 2011 05:45

Eclipse User

Hi,

the parser works the same as far as I can tell. How would you quantify "better"?

But I am still trying to figure out the original questions: Is there a rule regarding sub-structuring terminals?? And why do I get this EOF-error??

Is there anyone here, who can maybe explain that? Or push me in the right direction? Smile

Thanks, Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759337 is a reply to message #759308]

Mon, 28 November 2011 07:24

Eclipse User

Hmmm... I played around with the grammar a bit more. And when I try to apply the inlining to my real solution it get's lots of errors.

I simplified the problem with the example I have given:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

// changed to hxxp for posting - links not allowed
generate myDsl "hxxp://www.xtext.org/example/mydsl/MyDsl"

Model:
	(entries+=Entry)*;
	
Entry :
	'<TOKEN' ('startX=114.24' | 'startX=311.24') 'startY=' INT '.' INT 'italicAngle=0.0' '>'
	info=Info 
	'</TOKEN>'
	
	'<TOKEN' 'startX=' INT '.' INT 'startY=' INT '.' INT 'italicAngle=1.0' '>'
	info2=Info
	'</TOKEN>'
	;
	
Info:
	name=ID
;

Using that grammar with the following instance:

<TOKEN startX=311.24 startY=431.84 italicAngle=0.0>asd</TOKEN>
<TOKEN startX=333.24 startY=333.33 italicAngle=1.0>asdasd</TOKEN>

I get the errors (both from line 2 at the "startX"):
mismatched character '3' expecting '1'
missing 'startX=' at '3'

But if I put the strings into terminals it works. There must be some rule, when to use terminals and when not.

Thanks, Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759346 is a reply to message #759337]

Mon, 28 November 2011 07:59

Eclipse User

Well, that "rule" might be nothing more than a chain of consequences resulting from how an Xtext grammar is mapped to an ANTLR grammar and how ANTLR maps that to a generated parser - heck, it might even be a bug. Understanding said rule will probably not bring you much, or rather, any closer to a grammar that's able to parse the class of documents you're aiming for. (In fact, given the current grammar, I'd say that using a regexp is probably much easier.)

The right direction would be understanding how to really write grammars in Xtext (not trying to map a grammar from some other parsing tech as verbatim as possible to Xtext). The tutorials and examples go a long way in helping. The first thing to do would probably be to understand what feature assignments are and why they are necessary (and different from unassigned rule calls).

[Updated on: Mon, 28 November 2011 08:00] by Moderator

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759616 is a reply to message #759346]

Tue, 29 November 2011 06:11

Eclipse User

Since it is an university project I am doing, I have to answer obvious questions in the way I am proceeding. If the grammar is highly redundant or has obvious problems with readability, it will definitely need some explanation.
I am already considering filing that as a bug.

Thanks, I did all the tutorials and examples. And I do know what feature assignments are. What is your point (besides trolling)?

Anyone else maybe wants to help? Sad

Thanks, Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759618 is a reply to message #759616]

Tue, 29 November 2011 06:27

Eclipse User

One could very easily consider you to be trolling. Besides, dismissing someone who's obviously trying to help like this is not really motivating for "anyone else", isn't it. I'll leave it at that.

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759621 is a reply to message #759618]

Tue, 29 November 2011 06:41

Eclipse User

*sigh* yes, I am trolling with writing elaborate examples to illustrate the problem. Suggestions like read the tutorials are not helping me at all, sorry.

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #759644 is a reply to message #759308]

Tue, 29 November 2011 08:03

Eclipse User

In general - the terminals are cookie cutters that hack up the input
into tokens. They are tried from most significant to lowest (in the
order they are stated) i.e. if your first terminal is:

terminal ANY : . ;

no terminal after it will ever be triggered.

If you want to create reusable terminal "parts" you should look at
'terminal fragments' - they are now supported in Xtext. All reusable
parts are declared with "terminal fragment". Here is an example:

-------- (From Cloudsmith/Geppetto @ github), pp.xtext -------
terminal REGULAR_EXPRESSION
// Special rules in the lexer must prevent the RE from being recognized
// except after ',' 'node', '{','}, '=~', '!~'
: '/' RE_BODY '/' RE_FLAGS?
;

terminal fragment RE_BODY
: RE_FIRST_CHAR
RE_FOLLOW_CHAR*
;

terminal fragment RE_FIRST_CHAR
// regexp can not start with:
// - a '*' (illegal regexp, and makes it look like a MLCOMMENT start
// - a '/' since that makes it empty (which is an invalid regexp)
// - a NL since all of the regexp must be on one line
: (!('\n' | '*' | '/' | '\\') | RE_BACKSLASH_SEQUENCE)
;

terminal fragment RE_FOLLOW_CHAR
// subsequent regexp chars include '*'
: (RE_FIRST_CHAR | '*')
;

terminal fragment RE_BACKSLASH_SEQUENCE:
// Any character can be escaped except NL since all of the regexp must
// be on one line.
('\\' !'\n')
;

terminal fragment RE_FLAGS:
// RUBY REGEX flags: i o x m u e s n (optional, or in any order, but
// only use each once
// Puppet does not support these (currently), they are recognized to
// enable warning that
// they are not supported (no other meaning can be applied to letter
// appearing after
// the end '/' in a regexp. Check for supported flags can be done in
// validation if they
// become available.
('a'..'z')+
;

---------

Note that the terminal fragments do not become tokens themselves - e.g.
the grammar will never see RE_FLAGS, RE_BACKSLASH_SEQUENCE etc. The
grammar only gets the true terminal REGULAR_EXPRESSION. (Note, if you
try to actually use the example that it does not show everything to
handle regular expressions - snippet only illustrates how fragments are
used).

I have not looked at your grammar/terminals in great detail, but you
probably have overlapping / ambiguous terminals. When you change the
terminal rule to be non-terminal (i.e. a datatype rule), you moved the
recognition of the input from the lexer to the parser.

Also note that there is a difference between terminals and keywords. In
simple terms - a token is matched by the lexer, if it matches a keyword
this token is delivered instead.

As an example - the keyword 'if' is specified in the grammar (as a
keyword) and there is an ID terminal that matches identifiers. If the
input is "ifif" you get an ID token, and if input is "if", you get the
token IF. Contrast this with having specified the keyword 'if' as a
terminal. You would now have to specify it with higher precedence than
the ID terminal (or you would never see it, as ID would eat all matching
characters). Since it now has higher precedence than the ID rule, the
input "ifif" will be lexed as the two tokens IF IF.

I hope that explains the relationship between terminals, terminal
fragments and keywords.

Regards
- henrik

On 2011-28-11 11:45, Robin wrote:
> Hi,
>
> the parser works the same as far as I can tell. How would you quantify
> "better"?
>
> But I am still trying to figure out the original questions: Is there a
> rule regarding sub-structuring terminals?? And why do I get this
> EOF-error??
>
> Is there anyone here, who can maybe explain that? Or push me in the
> right direction? :)
> Thanks, Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #760111 is a reply to message #758731]

Thu, 01 December 2011 06:00

Eclipse User

Hi,

thanks so much for the detailed answer!

I think the order of terminals is the most important part. But the keyword / terminal distinction helped me also very much.

I also filed that as a bug and got the following response:
"Please note that terminal rules are order dependent (see the docs). I bet that
INFO_TOKEN_START is never matched - it does not even state that a space (' ')
is allowed. You should consider to use data type rules instead and refer to the
docs for details."
Weird thing is that I don't need the WS when I am not substructuring into terminals. I guess that is again the keyword - terminal distinction.

So here's the working grammar:

grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals

generate myDsl "http://www.xtext.org/example/mydsl/MyDsl" 

Model:
    (entries+=Entry)*;

Entry :
    INFO_TOKEN_START
    info=Info 
    TOKEN_END;

Info:
    name=ID;
    
terminal INFO_TOKEN_START:
    TOKEN_START WS MARKER WS Y_COORDINATE WS NON_ITALIC TOKEN_START_CLOSE;
    
terminal TOKEN_END:
    '</TOKEN>';

terminal fragment TOKEN_START:
    '<TOKEN';

terminal fragment TOKEN_START_CLOSE:
    '>';

terminal fragment MARKER:
    ('startX=114.24' | 'startX=311.24');

terminal fragment Y_COORDINATE:
    'startY=' INT '.' INT;

terminal fragment NON_ITALIC:
    'italicAngle=0.0';

Regards, Robin

Re: Why can't I use terminals like this? ('missing EOF at' - error) [message #760214 is a reply to message #760111]

Thu, 01 December 2011 10:51

Eclipse User

On 2011-01-12 12:00, Robin wrote:

> Weird thing is that I don't need the WS when I am not substructuring
> into terminals. I guess that is again the keyword - terminal distinction.
>

WS is hidden by default (since you are using the default terminals
grammar) - so when using a data type rule, WS and comments may appear
between any tokens. If you want to forbid that, you must use 'hidden()'
and then specify where any whitespace may appear (if any).

Regards
- henrik

Previous Topic:	[emf] question regarding ResourceSetImpl#uriResourceMap
Next Topic:	Formatting-Indent, documentation out of date

Goto Forum:

-=] Back to Top [=-

Current Time: Wed Jul 23 12:29:00 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter