Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Extraneous / mismatched input
Extraneous / mismatched input [message #1078805] Sat, 03 August 2013 15:58 Go to next message
Uli B is currently offline Uli BFriend
Messages: 36
Registered: January 2012
Member
Hello,

With this grammar:

grammar tg.TG hidden (WS) // with org.eclipse.xtext.common.Terminals

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

generate tG "http://www.TG.tg"

TG:
  '<?xml version="1.0" encoding="UTF-8"?>' Tag;
	
Tag:
  {Tag}  ('<' name=TagName (('>' content+=Content* ('</' nameEnd=[Tag|TagName] '>')) | '/>'));

Content:
  {Content} (tag=Tag | charData=CHARDATA);

TagName:   
  ID (':' ID)*;
  
terminal ID:
  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

terminal WS:
  (' '|'\t'|'\r'|'\n')+;

terminal CHARDATA: 
   (!('<'|'&'|']]>'|'>'))*; 



and this sample:

<?xml version="1.0" encoding="UTF-8"?>
<t>
  <property>somecontent</property>
  <property/>
</t>


I get for the first property: extraneous input 'somecontent' expecting '</'. Seems this is taken as ID and not as CHARDATA (when I insert a non-ID character, there is no error).

The second property produces: mismatched input 'property/' expecting RULE_ID.

What can I do to fix these two errors. Any idea?

Re: Extraneous / mismatched input [message #1080037 is a reply to message #1078805] Mon, 05 August 2013 11:50 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
From what I see, the main problem are the overlapping Terminals in CHARDATA and ID.
As "somecontent" is a valid ID and ID is the first matching Terminal, it is lexed as an ID. Note the difference in behaviour when CHARDATA is defined first.
See also: http://zarnekow.blogspot.de/2012/11/xtext-corner-6-data-types-terminals-why.html

Content:
  {Content} (tag=Tag | charData=CharDataType);

CharDataType:
  	ID|CHARDATA
;
  
terminal WS:
  (' '|'\t'|'\r'|'\n')+;
  
terminal ID:
  ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

terminal CHARDATA: 
   (!('<'|'&'|']]>'|'>'))*; 

That works, but hell if I know whether that's best or even good practice.
Also, I don't know for sure, but are "/" allowed in Chardata? If they are not, simply disallow it in the Terminal to fix the second error.

[Updated on: Mon, 05 August 2013 11:59]

Report message to a moderator

Re: Extraneous / mismatched input [message #1082264 is a reply to message #1080037] Thu, 08 August 2013 10:07 Go to previous messageGo to next message
Uli B is currently offline Uli BFriend
Messages: 36
Registered: January 2012
Member
Thank you very much, Claudio. Indeed, the CharDataType solves the first issue. (BTW: I came across that blog while googling before posting here. It states the problem, but does not provide a solution, does it?).

For the second issue: Simply disallowing the "/" will not help. Firstly, it is allowed in CHARDATA, and secondly, the grammar will look slightly extended with:

Tag:
  {Tag}  ('<' name=TagName attributes+=Attribute* ('/>' | ('>' content+=Content* ('</' nameEnd=[Tag|TagName] '>'))));

Attribute:
  name=ID '=' value=STRING;


I wonder why the analyzing process doesn't simply recognize the '<' ID '/>' sequence, as ID is defined before CHARDATA and does not allow the "/". So I would expect it to stop at the "/" and take the ID - and stop at anything not in ID, like WS between the ID and eventually following attributes. I tried with introducing terminals for the '<', '/>', etc. Does not help.

What can I do?

[Updated on: Thu, 08 August 2013 10:12]

Report message to a moderator

Re: Extraneous / mismatched input [message #1084891 is a reply to message #1082264] Mon, 12 August 2013 07:26 Go to previous messageGo to next message
Claudio Heeg is currently offline Claudio HeegFriend
Messages: 75
Registered: April 2013
Member
The problem is that the lexer is greedy, iirc.
That means as many tokens as possible are consumed to define a keyword.
I also don't know whether that behaviour can be changed directly, I'm afraid. Hope someone more knowledgeable comes along in a while.

Something along the lines of disallowing "/" at the end of CHARDATA might help?

[Updated on: Mon, 12 August 2013 08:00]

Report message to a moderator

Re: Extraneous / mismatched input [message #1085035 is a reply to message #1084891] Mon, 12 August 2013 11:12 Go to previous messageGo to next message
Ian McDevitt is currently offline Ian McDevittFriend
Messages: 70
Registered: December 2012
Location: Belfast
Member
Some things to try, looking at your original grammar:

1. be more specific about what CDATA is, rather than what it isn't
terminal CHARDATA: "<![CDATA[" -> "]]>";


2. define it before ID so the lexer prefers to consider it first and only matches ID where it is not CDATA.

3. don't put ID in charData (that is avoid ID|CHARDATA as the CDATA doesn't really have IDs in it so don't model it that way)

In your code you may need to strip the delimiters off charData to get the CDATA text.

Re: Extraneous / mismatched input [message #1085228 is a reply to message #1085035] Mon, 12 August 2013 16:12 Go to previous messageGo to next message
Uli B is currently offline Uli BFriend
Messages: 36
Registered: January 2012
Member
Hm, unfortunately CharData is not CDATA. (The original definition from the XML spec is

CharData	   ::=   	[^<&]* - ([^<&]* ']]>' [^<&]*)
)

To simplify things, I could define it as Everything between a right angle bracket '>' and a left angle bracket '<', excluding the brackets. But as far as I see this is also not possible with Xtext/antlr, since the Until Token terminal always includes the surrounding keywords, right?.

Currently I'm going to subclass the Lexer class, following an idea described in http://www.eclipse.org/forums/index.php/t/200863/, to consider the context (basically: allow CharData only when the last token was a '>', otherwise match an ID), but up to now, this ends up with a lot of mismatched character '>' expecting set null errors ... sigh!

But maybe I am anyway on a completely wrong path. My original intention was not to write an Xtext grammar for xml. Rather, I want to be able to reference xml files from an Xtext-based DSL. This DSL is already working fine so far. Now, I want to pick some names (content assist) from an xml file, having something like 'include abc.xml' in such DSL files. Is there a way to achieve this, possibly by creating a model using EMF and a schema .xsd? (I'm not asking for how to do the include, but for how to create a model that can be imported. Somewhere I read this is only possible when the included files are .xmi ... and this way I came here ...)





Re: Extraneous / mismatched input [message #1085326 is a reply to message #1085228] Mon, 12 August 2013 19:27 Go to previous messageGo to next message
Ian McDevitt is currently offline Ian McDevittFriend
Messages: 70
Registered: December 2012
Location: Belfast
Member
Ah I misunderstood. Then Claudio is more helpful here. If I think of anything else I'll add it later.
Re: Extraneous / mismatched input [message #1090700 is a reply to message #1085326] Tue, 20 August 2013 14:19 Go to previous message
Uli B is currently offline Uli BFriend
Messages: 36
Registered: January 2012
Member
No way out?
Previous Topic:ISemanticModification problems
Next Topic:How to get access to an IXtextDocument (IWriteAccess) from within an IDerivedStateComputer?
Goto Forum:
  


Current Time: Fri Apr 26 07:47:30 GMT 2024

Powered by FUDForum. Page generated in 0.02814 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top