What is wrong with my grammar [message #1769726] |
Wed, 02 August 2017 22:37  |
Waqas Ilyas Messages: 80 Registered: July 2009 |
Member |
|
|
Hi,
I don't like to ask overly broad questions in hopes someone will do my work for me. But I am stuck and don't have much clues as to what is wrong with my grammar. I have several issues to solve, so I am starting with something basic and minimal.
Have a look at this grammar:
/* Example:
/pls/;
/ {
p = "hello";
q = &label;
label: a {
r = 23;
}
}
*/
root:
v='/pls/' ';'
'/' '{'
properties+=property*
node+=Node*
'}'
;
Node:
(label=Label)? name=IDS
'{'
properties+=property*
nodes+=Node*
'}'
;
property:
name=IDS '=' (startLabel=Label)? (value=STRING | literal=Literal | ref=Reference) (endLabel=Label)? ';'
;
Label hidden():
name=IDS ':'
;
Reference hidden():
'&' label=[Label|IDS]
;
Literal returns ecore::ELong hidden():
('0x'|'0X')?
(
'a'|'b'|'c'|'d'|'e'|'f'|
'A'|'B'|'C'|'D'|'E'|'F'|
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
)+
;
IDS returns ecore::EString hidden():
(
'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'|'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'|
'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z'|
'0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|','|'.'|'_'|'+'|'*'|'#'|'?'|'-'
)+
;
// Terminals
terminal STRING :
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"'
;
terminal ML_COMMENT :
'/*' -> '*/'
;
terminal WS:
(' '|'\t'|'\r'|'\n')+
A few points on the choices above:
- A little while ago my grammar was working fine, when I had just one terminal that matched all identifiers and literals in the same rule. I used datatype rules to convert values and everything seemed to work fine
- Now I need to add expression handling, so I can't work with a simple terminal. As you can see identifiers can have characters like hyphen (-) :'( , and so can expressions. So I need context awareness. Which I understand terminals cannot have.
- So I decided to make datatype rules for both integer literals and identifiers so that i can use "-" and "+" in expressions
- Problem: This grammar doesn't compile. If I remove (endLabel=Label)? from property, it compiles. I understand it doesn't know where an integer ends an a label starts so it errors out. But why is that, when an integer literal rule is using hidden() which means any white-space would stop the rule from consuming more characters
- Problem: If (endLabel=Label)? is removed, and it compiles, it starts consuming strings such as "2 2" as a single literal or string such as " - a -" as an identifier... why is that? I don't want any spaces to be consumed like this.
- Problem: having specified individual characters as keywords, make them terminals. So when i double click a word in text editor it only selects one character. Also EVERY character appears as a keyword
Basically I am converting an existing bison grammar into Xtext and it uses start conditions which allow context specific lexical rules. What is the best way to define such a grammar in Xtext?
|
|
|
Re: What is wrong with my grammar [message #1769736 is a reply to message #1769726] |
Thu, 03 August 2017 04:58   |
|
well, these are too many questions at once
- how does you datatype rule solve the expression problems 1+1 will be an IDS?
- how does the header of the grammar look like?
- with grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT) it generates with warnings. even if you have endlabel
- could be there are still some bugs around the hidden stuff.
- you can tell antlr to create a debug grammar. (e.g. to be opened with antlrworks)
// inside the language section if wf
parserGenerator = {
debugGrammar = true
}
- maybe you need an external / custom lexer e.g. built with jflex (there are some blogpost around the topic)
-maybe you can work around the buggy hidden support using explicit WS
Property hidden():
name=IDS WS? '='WS? (startLabel=Label WS)? (value=STRING | literal=Literal | ref=Reference) ( WS endLabel=Label)? WS?';'
;
- the double klick problem has to do with the partitioning. same the coloring behaviour. so maybe you need to customire syntax highlighting. or you can somehow trick DefaultAntlrTokenToAttributeIdMapper to behave differently regarding "some" keywords
Need professional support for Xtext, Xpand, EMF?
Go to: https://www.itemis.com/en/it-services/methods-and-tools/xtext
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
|
|
|
Re: What is wrong with my grammar [message #1769830 is a reply to message #1769736] |
Thu, 03 August 2017 19:51   |
Waqas Ilyas Messages: 80 Registered: July 2009 |
Member |
|
|
Hi Christian,
Thanks for the reply and suggestions!
>> how does you datatype rule solve the expression problems 1+1 will be an IDS?
I did not add expression syntax to the example grammar above, because I did not want to complicate the scenario. However, I just wanted to let you know that I am not using terminals because identifiers can have for example a hyphen (-) and so can values that have expressions. So terminals cannot be used and I have rely on datatype rules (if i understand correctly). The real problem i am facing is that i can't seem to understand why a Literal or IDS rule is also consuming white-space characters.
>> how does the header of the grammar look like?
Here is the header:
grammar org.xtext.example.sdt.SysDefTree hidden(ML_COMMENT, WS)
generate sysDefTree "http://www.xtext.org/example/sdt/SysDefTree"
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
>> with grammar org.xtext.example.mydsl.MyDsl hidden(WS, ML_COMMENT) it generates with warnings. even if you have endlabel
For me, with endlabel I get errors, without it I can build. And the editor is generated.
>> could be there are still some bugs around the hidden stuff.
>> you can tell antlr to create a debug grammar. (e.g. to be opened with antlrworks)
I have generated an antlr debug grammar which I used to identify that Literal rule is consuming whitespace when it shouldn't. What is more confusing for me is that if I remove endlabel, and generate editor, it seems to work fine. That is it does not consume extra whitespace for the given Literal rule. However, when I use the debug grammar generated in antlrworks (v1.5.2) it seems to consume whitespace without any problems. So for a property like "s = 2 3 4;" antlrworks gives no errors and parses as a single literal "234" but the xtext generated editor gives error on this line. So to me it looks like there is some difference between Xtext and Antlr, that I am caught in between.
>> maybe you need an external / custom lexer e.g. built with jflex (there are some blogpost around the topic)
I can look at jflex, and if it is somewhat compatible with flex, i maybe able to use the original flex-bison source. I will try to find some blogs once I understand the above problem
>> maybe you can work around the buggy hidden support using explicit WS
I tried that but somewhat casually. I will have a look at it again, and report back.
>> the double klick problem has to do with the partitioning. same the coloring behaviour. so maybe you need to customire syntax highlighting. or you can somehow trick DefaultAntlrTokenToAttributeIdMapper to behave differently regarding "some" keywords
Thanks i will look at it once i solved the grammar issues.
|
|
|
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.02240 seconds