Eclipse Community Forums: TMF (Xtext) » Extraneous / mismatched input

Help

Home

Home » Modeling » TMF (Xtext) » Extraneous / mismatched input

Show: Today's Messages :: Show Polls :: Message Navigator

Extraneous / mismatched input [message #1078805]

Sat, 03 August 2013 15:58

Uli B

Messages: 36
Registered: January 2012

Member

Hello,

With this grammar:

grammar tg.TG hidden (WS) // with org.eclipse.xtext.common.Terminals

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate tG "http://www.TG.tg"

TG:
  '<?xml version="1.0" encoding="UTF-8"?>' Tag;
	
Tag:
  {Tag}  ('<' name=TagName (('>' content+=Content* ('</' nameEnd=[Tag|TagName] '>')) | '/>'));

Content:
  {Content} (tag=Tag | charData=CHARDATA);

TagName:   
  ID (':' ID)*;
  
terminal ID:
  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

terminal WS:
  (' '|'\t'|'\r'|'\n')+;

terminal CHARDATA: 
   (!('<'|'&'|']]>'|'>'))*;

and this sample:

<?xml version="1.0" encoding="UTF-8"?>
<t>
  <property>somecontent</property>
  <property/>
</t>

I get for the first property: extraneous input 'somecontent' expecting '</'. Seems this is taken as ID and not as CHARDATA (when I insert a non-ID character, there is no error).

The second property produces: mismatched input 'property/' expecting RULE_ID.

What can I do to fix these two errors. Any idea?

Report message to a moderator

Re: Extraneous / mismatched input [message #1080037 is a reply to message #1078805]

Mon, 05 August 2013 11:50

Claudio Heeg

Messages: 75
Registered: April 2013

Member

From what I see, the main problem are the overlapping Terminals in CHARDATA and ID.
As "somecontent" is a valid ID and ID is the first matching Terminal, it is lexed as an ID. Note the difference in behaviour when CHARDATA is defined first.
See also: http://zarnekow.blogspot.de/2012/11/xtext-corner-6-data-types-terminals-why.html

Content:
  {Content} (tag=Tag | charData=CharDataType);

CharDataType:
  	ID|CHARDATA
;
  
terminal WS:
  (' '|'\t'|'\r'|'\n')+;
  
terminal ID:
  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

terminal CHARDATA: 
   (!('<'|'&'|']]>'|'>'))*;

That works, but hell if I know whether that's best or even good practice.
Also, I don't know for sure, but are "/" allowed in Chardata? If they are not, simply disallow it in the Terminal to fix the second error.

[Updated on: Mon, 05 August 2013 11:59]

Report message to a moderator

Re: Extraneous / mismatched input [message #1082264 is a reply to message #1080037]

Thu, 08 August 2013 10:07

Uli B

Messages: 36
Registered: January 2012

Member

Thank you very much, Claudio. Indeed, the CharDataType solves the first issue. (BTW: I came across that blog while googling before posting here. It states the problem, but does not provide a solution, does it?).

For the second issue: Simply disallowing the "/" will not help. Firstly, it is allowed in CHARDATA, and secondly, the grammar will look slightly extended with:

Tag:
  {Tag}  ('<' name=TagName attributes+=Attribute* ('/>' | ('>' content+=Content* ('</' nameEnd=[Tag|TagName] '>'))));

Attribute:
  name=ID '=' value=STRING;

I wonder why the analyzing process doesn't simply recognize the '<' ID '/>' sequence, as ID is defined before CHARDATA and does not allow the "/". So I would expect it to stop at the "/" and take the ID - and stop at anything not in ID, like WS between the ID and eventually following attributes. I tried with introducing terminals for the '<', '/>', etc. Does not help.

What can I do?

[Updated on: Thu, 08 August 2013 10:12]

Report message to a moderator

Re: Extraneous / mismatched input [message #1084891 is a reply to message #1082264]

Mon, 12 August 2013 07:26

Claudio Heeg

Messages: 75
Registered: April 2013

Member

The problem is that the lexer is greedy, iirc.
That means as many tokens as possible are consumed to define a keyword.
I also don't know whether that behaviour can be changed directly, I'm afraid. Hope someone more knowledgeable comes along in a while.

Something along the lines of disallowing "/" at the end of CHARDATA might help?

[Updated on: Mon, 12 August 2013 08:00]

Report message to a moderator

Re: Extraneous / mismatched input [message #1085035 is a reply to message #1084891]

Mon, 12 August 2013 11:12

Ian McDevitt

Messages: 70
Registered: December 2012
Location: Belfast

Member

Some things to try, looking at your original grammar:

1. be more specific about what CDATA is, rather than what it isn't

terminal CHARDATA: "<![CDATA[" -> "]]>";

2. define it before ID so the lexer prefers to consider it first and only matches ID where it is not CDATA.

3. don't put ID in charData (that is avoid ID|CHARDATA as the CDATA doesn't really have IDs in it so don't model it that way)

In your code you may need to strip the delimiters off charData to get the CDATA text.

Report message to a moderator

Re: Extraneous / mismatched input [message #1085228 is a reply to message #1085035]

Mon, 12 August 2013 16:12

Uli B

Messages: 36
Registered: January 2012

Member

Hm, unfortunately CharData is not CDATA. (The original definition from the XML spec is

CharData	   ::=   	[^<&]* - ([^<&]* ']]>' [^<&]*)

)

To simplify things, I could define it as Everything between a right angle bracket '>' and a left angle bracket '<', excluding the brackets. But as far as I see this is also not possible with Xtext/antlr, since the Until Token terminal always includes the surrounding keywords, right?.

Currently I'm going to subclass the Lexer class, following an idea described in http://www.eclipse.org/forums/index.php/t/200863/, to consider the context (basically: allow CharData only when the last token was a '>', otherwise match an ID), but up to now, this ends up with a lot of mismatched character '>' expecting set null errors ... sigh!

But maybe I am anyway on a completely wrong path. My original intention was not to write an Xtext grammar for xml. Rather, I want to be able to reference xml files from an Xtext-based DSL. This DSL is already working fine so far. Now, I want to pick some names (content assist) from an xml file, having something like 'include abc.xml' in such DSL files. Is there a way to achieve this, possibly by creating a model using EMF and a schema .xsd? (I'm not asking for how to do the include, but for how to create a model that can be imported. Somewhere I read this is only possible when the included files are .xmi ... and this way I came here ...)

Report message to a moderator