Parser rule for floating points with scientific notation [message #1856600] |
Fri, 16 December 2022 12:41  |
Eclipse User |
|
|
|
I have been struggling to get my parser rule for scientific numbers exactly right.
It should support numbers such as
3.14
-3.14
+3.14
3.
.14
3.14e5
3.14E5
3.14e+5
3.14e-5
Extra conditions:
A) It should not allow spaces at any point (i.e., 3. 14 is not valid).
B) Strings such as E3 and e should be valid identifiers, i.e., they should not conflict with this rule in some way.
C) It should not conflict with a rule for integer ranges of the form '(' INT '..' INT ')'. Example: (5..42)
Attempt 1:
ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Failing case: 3.14e5
I think 'e5' is parsed as a single keyword, hence it fails.
Attempt 2:
ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) SCIENTIFIC?
;
terminal SCIENTIFIC:
('e' | 'E') ('+' | '-')? INT
;
==>Condition B fails: E3 is not a valid identifier because the lexer behaves differently.
Attempt 3:
terminal SCIENTIFIC_FLOAT:
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Condition C fails: the integer range (5..42) is not valid anymore.
I'm out of guesses... Is there any way I can get it to behave exactly as I want?
[Updated on: Fri, 16 December 2022 12:42] by Moderator
|
|
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856620 is a reply to message #1856600] |
Sun, 18 December 2022 11:43   |
Eclipse User |
|
|
|
Hi Ed
'..' is actually not resolved correctly by the lexer. In my "attempt 3", the new terminal rule is conflicting with the (INT..INT) rule. Example: with the new terminal rule, a string (5..42) is tokenized as
'(', '5.', '.42', and ')'
instead of
'(', '5', '..', '42', and ')'
so the '..' keyword is never recognized.
Thanks for the link, that looks interesting. I'll take a look at it on Monday.
Regards
Simon
[Updated on: Sun, 18 December 2022 18:05] by Moderator
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856622 is a reply to message #1856620] |
Sun, 18 December 2022 17:53   |
Eclipse User |
|
|
|
@Christian Dietrich, thanks for the link. Their code is basically this:
terminal DOUBLE returns ecore::EBigDecimal:
'.' DECIMAL_DIGIT_FRAGMENT+ EXPONENT_PART?
| DECIMAL_INTEGER_LITERAL_FRAGMENT '.' DECIMAL_DIGIT_FRAGMENT* EXPONENT_PART?
;
terminal fragment EXPONENT_PART:
('e' | 'E') SIGNED_INT
;
terminal fragment SIGNED_INT:
('+' | '-') DECIMAL_DIGIT_FRAGMENT+
;
terminal fragment DECIMAL_INTEGER_LITERAL_FRAGMENT:
'0'
| '1'..'9' DECIMAL_DIGIT_FRAGMENT*
;
terminal fragment DECIMAL_DIGIT_FRAGMENT:
'0'..'9'
;
Observations:
1. The sign of their exponent is mandatory, and therefore they do not have the problem that I had in my "attempt 1". I would like to make it optional, just like in Java.
2. Even more important; I think they suffer from the same problem as I have in attempt 3: the '..' keyword in an integer range would conflict with it. This rule makes the lexer tokenize it differently, e.g., instead of lexing (5..42) into '(', '5', '..', '42' and ')', it is now lexed into '(', '5.', '.42' and ')', so the '..' keyword effectively disappears.
So... still the same problem. I wonder how Java does it. (although Java doesn't require syntax similar to my integer ranges, so it could just not care about that)
[Updated on: Mon, 19 December 2022 18:21] by Moderator
|
|
|
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856631 is a reply to message #1856628] |
Mon, 19 December 2022 15:14  |
Eclipse User |
|
|
|
Ed, Christian
Thank you for the links and info. It has been really helpful to understand what's going on.
Since I'm not permitted to spent too much time on this, I will make a compromise and disallow using 'e' and 'E' as an identifier for now. Hopefully I can get back to this when I have the opportunity to dive deeper into lexer replacements.
Regards
Simon
[Updated on: Mon, 19 December 2022 18:20] by Moderator
|
|
|
Powered by
FUDForum. Page generated in 0.07764 seconds