Parser rule for floating points with scientific notation [message #1856600] |
Fri, 16 December 2022 17:41 |
Simon Cockx Messages: 69 Registered: October 2021 |
Member |
|
|
I have been struggling to get my parser rule for scientific numbers exactly right.
It should support numbers such as
3.14
-3.14
+3.14
3.
.14
3.14e5
3.14E5
3.14e+5
3.14e-5
Extra conditions:
A) It should not allow spaces at any point (i.e., 3. 14 is not valid).
B) Strings such as E3 and e should be valid identifiers, i.e., they should not conflict with this rule in some way.
C) It should not conflict with a rule for integer ranges of the form '(' INT '..' INT ')'. Example: (5..42)
Attempt 1:
ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Failing case: 3.14e5
I think 'e5' is parsed as a single keyword, hence it fails.
Attempt 2:
ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) SCIENTIFIC?
;
terminal SCIENTIFIC:
('e' | 'E') ('+' | '-')? INT
;
==>Condition B fails: E3 is not a valid identifier because the lexer behaves differently.
Attempt 3:
terminal SCIENTIFIC_FLOAT:
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Condition C fails: the integer range (5..42) is not valid anymore.
I'm out of guesses... Is there any way I can get it to behave exactly as I want?
[Updated on: Fri, 16 December 2022 17:42] Report message to a moderator
|
|
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856620 is a reply to message #1856600] |
Sun, 18 December 2022 16:43 |
Simon Cockx Messages: 69 Registered: October 2021 |
Member |
|
|
Hi Ed
'..' is actually not resolved correctly by the lexer. In my "attempt 3", the new terminal rule is conflicting with the (INT..INT) rule. Example: with the new terminal rule, a string (5..42) is tokenized as
'(', '5.', '.42', and ')'
instead of
'(', '5', '..', '42', and ')'
so the '..' keyword is never recognized.
Thanks for the link, that looks interesting. I'll take a look at it on Monday.
Regards
Simon
[Updated on: Sun, 18 December 2022 23:05] Report message to a moderator
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856622 is a reply to message #1856620] |
Sun, 18 December 2022 22:53 |
Simon Cockx Messages: 69 Registered: October 2021 |
Member |
|
|
@Christian Dietrich, thanks for the link. Their code is basically this:
terminal DOUBLE returns ecore::EBigDecimal:
'.' DECIMAL_DIGIT_FRAGMENT+ EXPONENT_PART?
| DECIMAL_INTEGER_LITERAL_FRAGMENT '.' DECIMAL_DIGIT_FRAGMENT* EXPONENT_PART?
;
terminal fragment EXPONENT_PART:
('e' | 'E') SIGNED_INT
;
terminal fragment SIGNED_INT:
('+' | '-') DECIMAL_DIGIT_FRAGMENT+
;
terminal fragment DECIMAL_INTEGER_LITERAL_FRAGMENT:
'0'
| '1'..'9' DECIMAL_DIGIT_FRAGMENT*
;
terminal fragment DECIMAL_DIGIT_FRAGMENT:
'0'..'9'
;
Observations:
1. The sign of their exponent is mandatory, and therefore they do not have the problem that I had in my "attempt 1". I would like to make it optional, just like in Java.
2. Even more important; I think they suffer from the same problem as I have in attempt 3: the '..' keyword in an integer range would conflict with it. This rule makes the lexer tokenize it differently, e.g., instead of lexing (5..42) into '(', '5', '..', '42' and ')', it is now lexed into '(', '5.', '.42' and ')', so the '..' keyword effectively disappears.
So... still the same problem. I wonder how Java does it. (although Java doesn't require syntax similar to my integer ranges, so it could just not care about that)
[Updated on: Mon, 19 December 2022 23:21] Report message to a moderator
|
|
|
Re: Parser rule for floating points with scientific notation [message #1856623 is a reply to message #1856622] |
Sun, 18 December 2022 23:18 |
Simon Cockx Messages: 69 Registered: October 2021 |
Member |
|
|
Damn, I thought I cracked it with this rule:
ScientificFloat hidden():
('+' | '-')? (
('.' INT | INT '.' | INT '.' INT)
| FLOAT_WITH_EXPONENT
)
;
terminal FLOAT_WITH_EXPONENT:
('.' INT | INT '.' | INT '.' INT) ('e' | 'E') ('+' | '-')? ('0'..'9')+
;
Notice that in the terminal rule the exponent is mandatory, so my theory was that the lexer would not tokenize '5..42' into '5.' and '.42'.
But this apparently again clashes with my integer ranges somehow. I'm getting a "no viable alternative at character '..'"
I have no idea why.
[Updated on: Sun, 18 December 2022 23:22] Report message to a moderator
|
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.04658 seconds