Eclipse Community Forums: TMF (Xtext) » What is wrong with my grammar

Home » Modeling » TMF (Xtext) » What is wrong with my grammar

What is wrong with my grammar [message #1769726]

Wed, 02 August 2017 18:37

Eclipse User

Hi,

I don't like to ask overly broad questions in hopes someone will do my work for me. But I am stuck and don't have much clues as to what is wrong with my grammar. I have several issues to solve, so I am starting with something basic and minimal.

Have a look at this grammar:

/* Example:
/pls/;
/ {
  p = "hello";
  q = &label;
  label: a {
    r = 23;
  }
}
*/
root:
	v='/pls/' ';'
	
	'/' '{'
		properties+=property*
		node+=Node*
	'}'
;

Node:
	(label=Label)? name=IDS
	'{'
	    properties+=property*
		nodes+=Node*
	'}'
;

property:
	name=IDS '=' (startLabel=Label)? (value=STRING | literal=Literal | ref=Reference) (endLabel=Label)? ';'
;

Label hidden():
	name=IDS ':'
;

Reference hidden():
	'&' label=[Label|IDS]
;


Literal returns ecore::ELong hidden():
	('0x'|'0X')?
	(
		'a'|'b'|'c'|'d'|'e'|'f'|
		'A'|'B'|'C'|'D'|'E'|'F'|
		'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
	)+
;

IDS returns ecore::EString hidden():
	(
		'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'|'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'|
		'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'|
		'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|','|'.'|'_'|'+'|'*'|'#'|'?'|'-'
	)+
;

// Terminals
terminal STRING	: 
	'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"'
;

terminal ML_COMMENT	:
	'/*' -> '*/'
;

terminal WS:
	(' '|'\t'|'\r'|'\n')+

A few points on the choices above:

A little while ago my grammar was working fine, when I had just one terminal that matched all identifiers and literals in the same rule. I used datatype rules to convert values and everything seemed to work fine
Now I need to add expression handling, so I can't work with a simple terminal. As you can see identifiers can have characters like hyphen (-) :'( , and so can expressions. So I need context awareness. Which I understand terminals cannot have.
So I decided to make datatype rules for both integer literals and identifiers so that i can use "-" and "+" in expressions
Problem: This grammar doesn't compile. If I remove (endLabel=Label)? from property, it compiles. I understand it doesn't know where an integer ends an a label starts so it errors out. But why is that, when an integer literal rule is using hidden() which means any white-space would stop the rule from consuming more characters
Problem: If (endLabel=Label)? is removed, and it compiles, it starts consuming strings such as "2 2" as a single literal or string such as " - a -" as an identifier... why is that? I don't want any spaces to be consumed like this.
Problem: having specified individual characters as keywords, make them terminals. So when i double click a word in text editor it only selects one character. Also EVERY character appears as a keyword

Basically I am converting an existing bison grammar into Xtext and it uses start conditions which allow context specific lexical rules. What is the best way to define such a grammar in Xtext?

Re: What is wrong with my grammar [message #1769736 is a reply to message #1769726]

Thu, 03 August 2017 00:58

Eclipse User

well, these are too many questions at once

- how does you datatype rule solve the expression problems 1+1 will be an IDS?
- how does the header of the grammar look like?
- with grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT) it generates with warnings. even if you have endlabel
- could be there are still some bugs around the hidden stuff.
- you can tell antlr to create a debug grammar. (e.g. to be opened with antlrworks)
// inside the language section if wf
parserGenerator = {
debugGrammar = true
}
- maybe you need an external / custom lexer e.g. built with jflex (there are some blogpost around the topic)

-maybe you can work around the buggy hidden support using explicit WS

Property hidden():
name=IDS WS? '='WS? (startLabel=Label WS)? (value=STRING | literal=Literal | ref=Reference) ( WS endLabel=Label)? WS?';'
;

- the double klick problem has to do with the partitioning. same the coloring behaviour. so maybe you need to customire syntax highlighting. or you can somehow trick DefaultAntlrTokenToAttributeIdMapper to behave differently regarding "some" keywords

Re: What is wrong with my grammar [message #1769830 is a reply to message #1769736]

Thu, 03 August 2017 15:51

Eclipse User

Hi Christian,

Thanks for the reply and suggestions!

>> how does you datatype rule solve the expression problems 1+1 will be an IDS?
I did not add expression syntax to the example grammar above, because I did not want to complicate the scenario. However, I just wanted to let you know that I am not using terminals because identifiers can have for example a hyphen (-) and so can values that have expressions. So terminals cannot be used and I have rely on datatype rules (if i understand correctly). The real problem i am facing is that i can't seem to understand why a Literal or IDS rule is also consuming white-space characters.

>> how does the header of the grammar look like?
Here is the header:

grammar org.xtext.example.sdt.SysDefTree hidden(ML_COMMENT, WS)
generate sysDefTree "http://www.xtext.org/example/sdt/SysDefTree"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore

>> with grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT) it generates with warnings. even if you have endlabel
For me, with endlabel I get errors, without it I can build. And the editor is generated.

>> could be there are still some bugs around the hidden stuff.
>> you can tell antlr to create a debug grammar. (e.g. to be opened with antlrworks)
I have generated an antlr debug grammar which I used to identify that Literal rule is consuming whitespace when it shouldn't. What is more confusing for me is that if I remove endlabel, and generate editor, it seems to work fine. That is it does not consume extra whitespace for the given Literal rule. However, when I use the debug grammar generated in antlrworks (v1.5.2) it seems to consume whitespace without any problems. So for a property like "s = 2 3 4;" antlrworks gives no errors and parses as a single literal "234" but the xtext generated editor gives error on this line. So to me it looks like there is some difference between Xtext and Antlr, that I am caught in between.

>> maybe you need an external / custom lexer e.g. built with jflex (there are some blogpost around the topic)
I can look at jflex, and if it is somewhat compatible with flex, i maybe able to use the original flex-bison source. I will try to find some blogs once I understand the above problem

>> maybe you can work around the buggy hidden support using explicit WS
I tried that but somewhat casually. I will have a look at it again, and report back.

>> the double klick problem has to do with the partitioning. same the coloring behaviour. so maybe you need to customire syntax highlighting. or you can somehow trick DefaultAntlrTokenToAttributeIdMapper to behave differently regarding "some" keywords
Thanks i will look at it once i solved the grammar issues.

Re: What is wrong with my grammar [message #1769838 is a reply to message #1769830]

Thu, 03 August 2017 19:50

Eclipse User

I am trying to find out what options I have with an external lexer. Tried searching a few blogs but there seems to be big differences between versions of xtext. I am on 2.10, and any examples of older versions don't work. There also seems to be some big difference between internal "old" vs "new" workflows. I am not sure what exactly that is and if I am on a new workflow, can I still plug in an external lexer, generated by jflex (as you recommended)?

Re: What is wrong with my grammar [message #1769840 is a reply to message #1769838]

Thu, 03 August 2017 23:05

Eclipse User

Yes you can but I cannot tell you a step by step guide

https://github.com/TypeFox/xtext-jflex

https://typefox.io/taming-the-lexer

Re: What is wrong with my grammar [message #1769899 is a reply to message #1769840]

Fri, 04 August 2017 15:41

Eclipse User

Thanks a lot for your help, I will start looking into this and if I have problems I will start a new thread specific to them. However, as I go down this road, did you get a chance to read the details i sent two posts above? Do you see a custom lexer as a better solution or is there something wrong with my grammar?

Btw I tried specifying whitespace explicitly, and here is the solution I got working:

Property hidden():
	name=Identifier WS* '=' WS* (startLabel=Label WS*)? (value=STRING | literal=Literal | ref=Reference) (WS+ endLabel=Label)? WS* ';' WS*
;

however, I dont see this as a viable solution. The actual grammar from which I extracted out this simpler grammar to illustrate the problem is much more complex. And it can also have single-line and multi-line comments. I will have to litter my grammar with these explicit insertions and the resulting grammar will be too complex to manage. For example here is another example:

ByteValue:
	{ByteValue}
	(l1=Label)? '[' (byteSequences+=ByteSequence)* ']' (l2=Label)?
;
ByteSequence:
	bytes+=Bytes
;
Bytes returns custom::EIntegerArray hidden():
	(DIGIT | HEX_DIGIT)+
;

Here again the ByteSequence* is ambiguous because the parser can't understand where one sequence ends and the next one starts because it doesn't take into account whitespaces in between two sequences.

Re: What is wrong with my grammar [message #1769952 is a reply to message #1769899]

Sun, 06 August 2017 23:43

Eclipse User

well then you should really have a look at custom lexers. then you wont need all that workarounds

Re: What is wrong with my grammar [message #1775993 is a reply to message #1769952]

Wed, 08 November 2017 18:20

Eclipse User

Coming back here to just give an update that I was able to plug in a custom lexer based on JFlex with the help of links you provided. It was not easy, but a fun experience, and certainly doable. Thanks for all the help!

[Updated on: Thu, 09 November 2017 13:04] by Moderator

Previous Topic:	Problem with multiple alternatives
Next Topic:	Could quickfix be position sensitive?

Goto Forum:

-=] Back to Top [=-

Current Time: Thu May 15 13:05:25 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter