I am developing a grammar for an existing language and have ambiguity problems with integers, floats, hex numbers and range expressions.

Here is a minimal grammar to reconstruct my problem:

grammar de.davehofmann.test.TestDSL hidden (WS) import "[url removed because of spam filter]/emf/2002/Ecore" as ecore generate testDSL "[url removed because of spam filter]/test/TestDSL" Model: rangeExpression += RangeExpression*; terminal WS : (' '|'\t'|'\r'|'\n')+; terminal fragment DIGIT: ('0'..'9'); terminal INT returns ecore::EInt: DIGIT+; terminal fragment HEX_DIGIT: (DIGIT|'a'..'f'|'A'..'F'); terminal HEX returns ecore::EInt: '0x' HEX_DIGIT*; terminal IDENTIFIER : ('a'..'z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; FLOAT returns ecore::EFloat: INT DOT INT ('e' (MINUS|PLUS)? INT)? | (INT 'e' (MINUS|PLUS)? INT); terminal PLUS: '+'; terminal MINUS: '-'; terminal DOT: '.'; DOTDOT: DOT DOT; RangeExpression: '(' lowerBound=Expression DOTDOT upperBound=Expression ')'; Expression: IntegerLiteral | FloatLiteral | Variable; Variable: name=IDENTIFIER; IntegerLiteral: value=(INT | HEX); FloatLiteral: value=FLOAT;

The following expressions are valid:

(1..3) (2.2..5.1) (2e-10..3e-8) (0x1f..0x2d) (e..x)

Problems:

1. Hex Numbers are not recognized at all (line 4). What's wrong here?

2. 'e' must be a valid variable name, but it clashes with the exponent 'e' in the FLOAT rule (line 5).

If FLOAT becomes a terminal rule, then

3. The expression 1..3 (line 1) is not valid because the lexer gets confused and tries to read a float.

How can I solve this?

Thanks in advance for any hints!

Dave]]>

you may search the forum for

-lexed terminal rules that are conflicting

-datatype rules to solve that problem

-not to solve everything with the grammar but with semanct validation]]>

- henrik

On 2013-20-03 4:31, David Hofmann wrote:

> Hello all,

>

> I am developing a grammar for an existing language and have ambiguity problems with integers, floats, hex numbers and range expressions.

>

> Here is a minimal grammar to reconstruct my problem:

>

>

> grammar de.davehofmann.test.TestDSL hidden (WS)

>

> import "[url removed because of spam filter]/emf/2002/Ecore" as ecore

>

> generate testDSL "[url removed because of spam filter]/test/TestDSL"

>

> Model:

> rangeExpression += RangeExpression*;

>

> terminal WS : (' '|'\t'|'\r'|'\n')+;

>

> terminal fragment DIGIT: ('0'..'9');

> terminal INT returns ecore::EInt: DIGIT+;

>

> terminal fragment HEX_DIGIT: (DIGIT|'a'..'f'|'A'..'F');

> terminal HEX returns ecore::EInt: '0x' HEX_DIGIT*;

>

> terminal IDENTIFIER : ('a'..'z') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

>

> FLOAT returns ecore::EFloat: INT DOT INT ('e' (MINUS|PLUS)? INT)? | (INT 'e' (MINUS|PLUS)? INT);

>

> terminal PLUS: '+';

> terminal MINUS: '-';

> terminal DOT: '.';

>

> DOTDOT: DOT DOT;

>

> RangeExpression:

> '(' lowerBound=Expression DOTDOT upperBound=Expression ')';

>

> Expression:

> IntegerLiteral | FloatLiteral | Variable;

>

> Variable:

> name=IDENTIFIER;

>

> IntegerLiteral:

> value=(INT | HEX);

>

> FloatLiteral:

> value=FLOAT;

>

>

> The following expressions are valid:

>

>

> (1..3)

> (2.2..5.1)

> (2e-10..3e-8)

> (0x1f..0x2d)

> (e..x)

>

>

> Problems:

>

> 1. Hex Numbers are not recognized at all (line 4). What's wrong here?

> 2. 'e' must be a valid variable name, but it clashes with the exponent 'e' in the FLOAT rule (line 5).

>

> If FLOAT becomes a terminal rule, then

>

> 3. The expression 1..3 (line 1) is not valid because the lexer gets confused and tries to read a float.

>

> How can I solve this?

> Thanks in advance for any hints!

> Dave

>]]>

thank you for your hints. I searched for a solution in the forum, and finally found out that the problem regarding the Hex numbers was neither in the Lexer nor in the Parser. The problem was a missing value converter for hex numbers.

The error message was simply: "for input String". Suggestion: There should be a more meaningful error message like "Could not convert value 0xff to Integer".

However, I was not able to find a solution for the second problem: 'e' is not valid variable name because it is used in the FLOAT rule for the exponent notation. Can you point me to the right track in order to make 'e' a valid IDENTIFIER? Thanks in advance!]]>

>

> thank you for your hints. I searched for a solution in the forum, and

> finally found out that the problem regarding the Hex numbers was neither

> in the Lexer nor in the Parser. The problem was a missing value

> converter for hex numbers.

> The error message was simply: "for input String". Suggestion: There

> should be a more meaningful error message like "Could not convert value

> 0xff to Integer".

>

> However, I was not able to find a solution for the second problem: 'e'

> is not valid variable name because it is used in the FLOAT rule for the

> exponent notation. Can you point me to the right track in order to make

> 'e' a valid IDENTIFIER? Thanks in advance!

I have this in one grammar:

terminal HEX : '0' ('x'|'X')(('0'..'9')|('a'..'f')|('A'..'F'))+ ;

terminal INT : ('0'..'9')+;

REAL hidden(): INT '.' (EXT_INT | INT); // INT ? '.' (EXT_INT | INT);

terminal EXT_INT: INT ('e'|'E')('-'|'+') INT;

Regards

- henrik]]>

IDENTIFIER: ID | 'e'; // all "keywords" used in Parser / Data Type Rules must be listed here

Also see this post for a step by step solution.

Thanks for your help!]]>