|
|
Re: Xtext and the Antlr lexer hell [message #491193 is a reply to message #491004] |
Tue, 13 October 2009 15:35 |
Jens Kuenzer Messages: 29 Registered: October 2009 |
Junior Member |
|
|
Thanks for guidance. I got this working but wonder why it is so tricky:
grammar org.xtext.example.ApoTest
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate apotest "http://www.xtext.org/example/apotest"
Model hidden(SPACE, WS, COMMENT) : (name+=Name ';')*;
Name : isVar?=VAR? (value=IDENTIFIER | value=CHARACTER_LITERAL) (APOSTROPHE extent+=Name | DOT extent+=Name)*;
DOT hidden(SPACE, WS, COMMENT) : DOT_CHAR;
APOSTROPHE hidden(SPACE, WS, COMMENT) : APOSTROPHE_CHAR;
REHIDE hidden(SPACE, WS, COMMENT) : "^"?;
CHARACTER_LITERAL hidden() : APOSTROPHE_CHAR (GRAPHIC_CHARACTER | APOSTROPHE_CHAR APOSTROPHE_CHAR) APOSTROPHE_CHAR REHIDE;
IDENTIFIER hidden() : CHARACTER ( ( UNDERSCORE )? (CHARACTER | DIGIT) )* REHIDE;
GRAPHIC_CHARACTER : CHARACTER | DIGIT | DOT_CHAR | UNDERSCORE | SPACE | OTHER_CHAR;
terminal COMMENT : '--' !('\n'|'\r')* ('\r'? '\n')? ;
terminal WS : ('\t'|'\r'|'\n')+ ;
terminal SPACE : ' ';
terminal VAR : ('V'|'v')('A'|'a')('R'|'r');
terminal UNDERSCORE : '_';
terminal APOSTROPHE_CHAR : "'";
terminal DOT_CHAR : '.';
terminal CHARACTER : ('a'..'z'|'A'..'Z');
terminal DIGIT : ('0'..'9');
terminal OTHER_CHAR : '/' | ':' | ';' | '<' | '=' | '>' | '|'
| '\\' | '*' | '#' | '[' | ']' | '&' | '\'' | '(' | ')' | '+' | ',' | '-';
I still need to find a way how to handle input like:
Is there a way to define a empty REHIDE rule to circumvent the hidden bug ?
|
|
|
Re: Xtext and the Antlr lexer hell [message #491331 is a reply to message #491193] |
Wed, 14 October 2009 07:40 |
Sebastian Zarnekow Messages: 3118 Registered: July 2009 |
Senior Member |
|
|
Hi Jens,
it is a bug and local hidden tokens should not require this kind of
virtual rules. However, I'm afraid there is no other workaround.
Did you file a bugzilla?
Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com
Jens Kuenzer schrieb:
> Thanks for guidance. I got this working but wonder why it is so tricky:
>
> grammar org.xtext.example.ApoTest
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate apotest "http://www.xtext.org/example/apotest"
>
> Model hidden(SPACE, WS, COMMENT) : (name+=Name ';')*;
>
> Name : isVar?=VAR? (value=IDENTIFIER | value=CHARACTER_LITERAL)
> (APOSTROPHE extent+=Name | DOT extent+=Name)*;
>
> DOT hidden(SPACE, WS, COMMENT) : DOT_CHAR;
> APOSTROPHE hidden(SPACE, WS, COMMENT) : APOSTROPHE_CHAR;
>
> REHIDE hidden(SPACE, WS, COMMENT) : "^"?;
>
> CHARACTER_LITERAL hidden() : APOSTROPHE_CHAR (GRAPHIC_CHARACTER |
> APOSTROPHE_CHAR APOSTROPHE_CHAR) APOSTROPHE_CHAR REHIDE;
> IDENTIFIER hidden() : CHARACTER ( ( UNDERSCORE )? (CHARACTER | DIGIT) )*
> REHIDE;
> GRAPHIC_CHARACTER : CHARACTER | DIGIT | DOT_CHAR | UNDERSCORE | SPACE |
> OTHER_CHAR;
>
> terminal COMMENT : '--' !('\n'|'\r')* ('\r'? '\n')? ;
> terminal WS : ('\t'|'\r'|'\n')+ ;
> terminal SPACE : ' ';
>
> terminal VAR : ('V'|'v')('A'|'a')('R'|'r');
>
> terminal UNDERSCORE : '_';
> terminal APOSTROPHE_CHAR : "'";
> terminal DOT_CHAR : '.';
> terminal CHARACTER : ('a'..'z'|'A'..'Z');
> terminal DIGIT : ('0'..'9');
> terminal OTHER_CHAR : '/' | ':' | ';' | '<' | '=' | '>' | '|'
> | '\\' | '*' | '#' | '[' | ']' | '&' | '\'' | '(' | ')' | '+' | ',' | '-';
>
> I still need to find a way how to handle input like: invar ; Is there a
> way to define a empty REHIDE rule to circumvent the hidden bug ?
>
|
|
|
Re: Xtext and the Antlr lexer hell [message #491395 is a reply to message #491331] |
Wed, 14 October 2009 12:42 |
Jens Kuenzer Messages: 29 Registered: October 2009 |
Junior Member |
|
|
Ok the hidden() bug is in bugzilla.
But back to my never ending lexer problems:
I don't think data type rules can replace a good lexer.
Because now I have problems parsing these identifier:
The first one is a "var" "ray" instead of a single identifier.
The second one just fails.
Here my current version of xtext:
grammar org.xtext.example.ApoTest
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate apotest "http://www.xtext.org/example/apotest"
Model hidden(SPACE, WS, COMMENT) : (name+=Name ';')*;
Name : isVar?=VAR? (value=IDENTIFIER | value=CHARACTER_LITERAL) (APOSTROPHE extent+=Name | DOT extent+=Name)*;
DOT hidden(SPACE, WS, COMMENT) : DOT_CHAR;
APOSTROPHE hidden(SPACE, WS, COMMENT) : APOSTROPHE_CHAR;
REHIDE hidden(SPACE, WS, COMMENT) : "^"?;
CHARACTER_LITERAL hidden() : APOSTROPHE_CHAR (GRAPHIC_CHARACTER | APOSTROPHE_CHAR APOSTROPHE_CHAR) APOSTROPHE_CHAR REHIDE;
IDENTIFIER hidden() : CHARACTER ( ( UNDERSCORE )? (CHARACTER | DIGIT) )* REHIDE;
GRAPHIC_CHARACTER : CHARACTER | DIGIT | DOT_CHAR | UNDERSCORE | SPACE | OTHER_CHAR;
VAR hidden(): V A R REHIDE;
CHARACTER : (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z);
terminal COMMENT : '--' !('\n'|'\r')* ('\r'? '\n')? ;
terminal WS : ('\t'|'\r'|'\n')+ ;
terminal SPACE : ' ';
terminal A : ('a'|'A');
terminal B : ('b'|'B');
terminal C : ('c'|'C');
terminal D : ('d'|'D');
terminal E : ('e'|'E');
terminal F : ('f'|'F');
terminal G : ('g'|'G');
terminal H : ('h'|'H');
terminal I : ('i'|'I');
terminal J : ('j'|'J');
terminal K : ('k'|'K');
terminal L : ('l'|'L');
terminal M : ('m'|'M');
terminal N : ('n'|'N');
terminal O : ('o'|'O');
terminal P : ('p'|'P');
terminal Q : ('q'|'Q');
terminal R : ('r'|'R');
terminal S : ('s'|'S');
terminal T : ('t'|'T');
terminal U : ('u'|'U');
terminal V : ('v'|'V');
terminal W : ('w'|'W');
terminal X : ('x'|'X');
terminal Y : ('y'|'Y');
terminal Z : ('z'|'Z');
terminal UNDERSCORE : '_';
terminal APOSTROPHE_CHAR : "'";
terminal DOT_CHAR : '.';
terminal DIGIT : ('0'..'9');
terminal OTHER_CHAR : '/' | ':' | ';' | '<' | '=' | '>' | '|'
| '\\' | '*' | '#' | '[' | ']' | '&' | '\'' | '(' | ')'
| '+' | ',' | '-';
Is there a way to use xtext with an better external lexer?
Or are there some options like greediness in xtext?
|
|
|
Re: Xtext and the Antlr lexer hell [message #491496 is a reply to message #491395] |
Wed, 14 October 2009 19:31 |
Sebastian Zarnekow Messages: 3118 Registered: July 2009 |
Senior Member |
|
|
Hi Jens,
there exist plenty of possiblities to tweak Xtext and the way it
instantiates models.
The 'varSomething' example may be solved with a custom IAstFactory
implementation, for example.
Maybe you should outline the actual use case so we could try to match it
to the Xtext concepts.
Regards,
Sebastian
--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com
Jens Kuenzer schrieb:
> Ok the hidden() bug is in bugzilla.
>
> But back to my never ending lexer problems:
> I don't think data type rules can replace a good lexer.
>
> Because now I have problems parsing these identifier:
>
> varray ;
> v2 ;
>
> The first one is a "var" "ray" instead of a single identifier.
> The second one just fails.
>
> Here my current version of xtext:
>
> grammar org.xtext.example.ApoTest
> import "http://www.eclipse.org/emf/2002/Ecore" as ecore
> generate apotest "http://www.xtext.org/example/apotest"
>
> Model hidden(SPACE, WS, COMMENT) : (name+=Name ';')*;
>
> Name : isVar?=VAR? (value=IDENTIFIER | value=CHARACTER_LITERAL)
> (APOSTROPHE extent+=Name | DOT extent+=Name)*;
>
> DOT hidden(SPACE, WS, COMMENT) : DOT_CHAR;
> APOSTROPHE hidden(SPACE, WS, COMMENT) : APOSTROPHE_CHAR;
>
> REHIDE hidden(SPACE, WS, COMMENT) : "^"?;
>
> CHARACTER_LITERAL hidden() : APOSTROPHE_CHAR (GRAPHIC_CHARACTER |
> APOSTROPHE_CHAR APOSTROPHE_CHAR) APOSTROPHE_CHAR REHIDE;
> IDENTIFIER hidden() : CHARACTER ( ( UNDERSCORE )? (CHARACTER | DIGIT) )*
> REHIDE;
> GRAPHIC_CHARACTER : CHARACTER | DIGIT | DOT_CHAR | UNDERSCORE | SPACE |
> OTHER_CHAR;
> VAR hidden(): V A R REHIDE;
> CHARACTER : (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z);
>
> terminal COMMENT : '--' !('\n'|'\r')* ('\r'? '\n')? ;
> terminal WS : ('\t'|'\r'|'\n')+ ;
> terminal SPACE : ' ';
>
> terminal A : ('a'|'A');
> terminal B : ('b'|'B');
> terminal C : ('c'|'C');
> terminal D : ('d'|'D');
> terminal E : ('e'|'E');
> terminal F : ('f'|'F');
> terminal G : ('g'|'G');
> terminal H : ('h'|'H');
> terminal I : ('i'|'I');
> terminal J : ('j'|'J');
> terminal K : ('k'|'K');
> terminal L : ('l'|'L');
> terminal M : ('m'|'M');
> terminal N : ('n'|'N');
> terminal O : ('o'|'O');
> terminal P : ('p'|'P');
> terminal Q : ('q'|'Q');
> terminal R : ('r'|'R');
> terminal S : ('s'|'S');
> terminal T : ('t'|'T');
> terminal U : ('u'|'U');
> terminal V : ('v'|'V');
> terminal W : ('w'|'W');
> terminal X : ('x'|'X');
> terminal Y : ('y'|'Y');
> terminal Z : ('z'|'Z');
>
> terminal UNDERSCORE : '_';
> terminal APOSTROPHE_CHAR : "'";
> terminal DOT_CHAR : '.';
> terminal DIGIT : ('0'..'9');
> terminal OTHER_CHAR : '/' | ':' | ';' | '<' | '=' | '>' | '|'
> | '\\' | '*' | '#' | '[' | ']' | '&' | '\'' | '(' | ')'
> | '+' | ',' | '-';
>
> Is there a way to use xtext with an better external lexer?
> Or are there some options like greediness in xtext?
>
|
|
|
|
Re: Xtext and the Antlr lexer hell [message #492684 is a reply to message #491637] |
Wed, 21 October 2009 11:52 |
Jens Kuenzer Messages: 29 Registered: October 2009 |
Junior Member |
|
|
Hi, after reverting some of the changes I have now a solution:
grammar org.xtext.example.ApoTest
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate apotest "http://www.xtext.org/example/apotest"
Model hidden(SPACE, WS, COMMENT) : (name+=Name ';')*;
Name : isVar?=VAR? (value=IDENTIFIER | value=CHARACTER_LITERAL) (APOSTROPHE extent+=Name | DOT extent+=Name)*;
DOT hidden(SPACE, WS, COMMENT) : DOT_CHAR;
APOSTROPHE hidden(SPACE, WS, COMMENT) : APOSTROPHE_CHAR;
REHIDE hidden(SPACE, WS, COMMENT) : "^"?;
CHARACTER_LITERAL hidden() : APOSTROPHE_CHAR (GRAPHIC_CHARACTER | APOSTROPHE_CHAR APOSTROPHE_CHAR) APOSTROPHE_CHAR REHIDE;
GRAPHIC_CHARACTER : CHARACTER | DIGIT | DOT_CHAR | UNDERSCORE | SPACE | OTHER_CHAR;
IDENTIFIER : CHARACTER | LONG_IDENTIFIER;
terminal COMMENT : '--' !('\n'|'\r')* ('\r'? '\n')? ;
terminal WS : ('\t'|'\r'|'\n')+ ;
terminal SPACE : ' ';
terminal VAR : ("v"|"V")("a"|"A")("r"|"R");
terminal UNDERSCORE : '_';
terminal APOSTROPHE_CHAR : "'";
terminal DOT_CHAR : '.';
terminal CHARACTER : ('a'..'z')|('A'..'Z');
terminal DIGIT : ('0'..'9');
terminal OTHER_CHAR : '/' | ':' | ';' | '<' | '=' | '>' | '|'
| '\\' | '*' | '#' | '[' | ']' | '&' | '\'' | '(' | ')'
| '+' | ',' | '-';
terminal LONG_IDENTIFIER : CHARACTER ( ( UNDERSCORE )? (CHARACTER | DIGIT) )*;
The problem was the change of the IDENTIFIER rule to be a datatype rule.
Once reverted most of it back to a terminal rule it seems to work.
The trick here was the IDENTIFIER datatype rule because the CHARACTER rule matched better than the LONG_IDENTIFIER rule. Even a single character also matches a LONG_IDENTIFIER this is hidden by CHARACTER rule.
Maybe the xtext documentation of terminal tokens could clearify which terminal token matches first. Or another good idea would be a warning that parts of a terminal token is hidden by other tokens.
|
|
|
Powered by
FUDForum. Page generated in 0.04483 seconds