Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Parser rule for floating points with scientific notation
Parser rule for floating points with scientific notation [message #1856600] Fri, 16 December 2022 17:41 Go to next message
Simon Cockx is currently offline Simon CockxFriend
Messages: 41
Registered: October 2021
Member
I have been struggling to get my parser rule for scientific numbers exactly right.

It should support numbers such as

3.14
-3.14
+3.14
3.
.14
3.14e5
3.14E5
3.14e+5
3.14e-5

Extra conditions:
A) It should not allow spaces at any point (i.e., 3. 14 is not valid).

B) Strings such as E3 and e should be valid identifiers, i.e., they should not conflict with this rule in some way.

C) It should not conflict with a rule for integer ranges of the form '(' INT '..' INT ')'. Example: (5..42)

Attempt 1:

ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Failing case: 3.14e5
I think 'e5' is parsed as a single keyword, hence it fails.

Attempt 2:

ScientificFloat hidden():
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) SCIENTIFIC?
;
terminal SCIENTIFIC:
('e' | 'E') ('+' | '-')? INT
;
==>Condition B fails: E3 is not a valid identifier because the lexer behaves differently.

Attempt 3:

terminal SCIENTIFIC_FLOAT:
('+' | '-')? ('.' INT | INT '.' | INT '.' INT) (('e' | 'E') ('+' | '-')? INT)?
;
==>Condition C fails: the integer range (5..42) is not valid anymore.


I'm out of guesses... Is there any way I can get it to behave exactly as I want?

[Updated on: Fri, 16 December 2022 17:42]

Report message to a moderator

Re: Parser rule for floating points with scientific notation [message #1856602 is a reply to message #1856600] Fri, 16 December 2022 19:18 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14434
Registered: July 2009
Senior Member
maybe you can check what others have done e.g. https://github.com/eclipse/n4js/blob/master/plugins/org.eclipse.n4js/src/org/eclipse/n4js/TypeExpressions.xtext

Need professional support for Xtext, Xpand, EMF?
Go to: https://www.itemis.com/en/it-services/methods-and-tools/xtext
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Parser rule for floating points with scientific notation [message #1856603 is a reply to message #1856602] Fri, 16 December 2022 20:20 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7564
Registered: July 2009
Senior Member
Hi

What you report should be no problem since '..' is a distinct token and so should be resolved by the lexer. If you really care about gratuitous spaces, you may need to play games with hiding.

But more likely you have the same problem as OCL where "." is also a binary navigation operator. This is a hard to handle syntactic ambiguity, but is relatively easy to handle lexically. The OCL parser therefore inserts a RetokenizingTokenSource between the standard Xtext lexer and parser to resolve the ambiguity lexically and so make the grammar easy.

See https://git.eclipse.org/r/plugins/gitiles/ocl/org.eclipse.ocl/+/refs/heads/master/plugins/org.eclipse.ocl.xtext.base/src/org/eclipse/ocl/xtext/base/services/RetokenizingTokenSource.java

Regards

Ed Willink
Re: Parser rule for floating points with scientific notation [message #1856620 is a reply to message #1856600] Sun, 18 December 2022 16:43 Go to previous messageGo to next message
Simon Cockx is currently offline Simon CockxFriend
Messages: 41
Registered: October 2021
Member
Hi Ed

'..' is actually not resolved correctly by the lexer. In my "attempt 3", the new terminal rule is conflicting with the (INT..INT) rule. Example: with the new terminal rule, a string (5..42) is tokenized as
'(', '5.', '.42', and ')'
instead of
'(', '5', '..', '42', and ')'
so the '..' keyword is never recognized.

Thanks for the link, that looks interesting. I'll take a look at it on Monday.

Regards

Simon

[Updated on: Sun, 18 December 2022 23:05]

Report message to a moderator

Re: Parser rule for floating points with scientific notation [message #1856622 is a reply to message #1856620] Sun, 18 December 2022 22:53 Go to previous messageGo to next message
Simon Cockx is currently offline Simon CockxFriend
Messages: 41
Registered: October 2021
Member
@Christian Dietrich, thanks for the link. Their code is basically this:

terminal DOUBLE returns ecore::EBigDecimal:
	'.' DECIMAL_DIGIT_FRAGMENT+ EXPONENT_PART?
	| DECIMAL_INTEGER_LITERAL_FRAGMENT '.' DECIMAL_DIGIT_FRAGMENT* EXPONENT_PART?
;

terminal fragment EXPONENT_PART:
	  ('e' | 'E') SIGNED_INT
;

terminal fragment SIGNED_INT:
	('+' | '-') DECIMAL_DIGIT_FRAGMENT+
;

terminal fragment DECIMAL_INTEGER_LITERAL_FRAGMENT:
	'0'
	| '1'..'9' DECIMAL_DIGIT_FRAGMENT*
;
terminal fragment DECIMAL_DIGIT_FRAGMENT:
	'0'..'9'
;


Observations:
1. The sign of their exponent is mandatory, and therefore they do not have the problem that I had in my "attempt 1". I would like to make it optional, just like in Java.
2. Even more important; I think they suffer from the same problem as I have in attempt 3: the '..' keyword in an integer range would conflict with it. This rule makes the lexer tokenize it differently, e.g., instead of lexing (5..42) into '(', '5', '..', '42' and ')', it is now lexed into '(', '5.', '.42' and ')', so the '..' keyword effectively disappears.

So... still the same problem. I wonder how Java does it. (although Java doesn't require syntax similar to my integer ranges, so it could just not care about that)

[Updated on: Mon, 19 December 2022 23:21]

Report message to a moderator

Re: Parser rule for floating points with scientific notation [message #1856623 is a reply to message #1856622] Sun, 18 December 2022 23:18 Go to previous messageGo to next message
Simon Cockx is currently offline Simon CockxFriend
Messages: 41
Registered: October 2021
Member
Damn, I thought I cracked it with this rule:

ScientificFloat hidden():
	('+' | '-')? (
		('.' INT | INT '.' | INT '.' INT)
		| FLOAT_WITH_EXPONENT
	)
;

terminal FLOAT_WITH_EXPONENT:
    ('.' INT | INT '.' | INT '.' INT) ('e' | 'E') ('+' | '-')? ('0'..'9')+
;

Notice that in the terminal rule the exponent is mandatory, so my theory was that the lexer would not tokenize '5..42' into '5.' and '.42'.

But this apparently again clashes with my integer ranges somehow. I'm getting a "no viable alternative at character '..'"

I have no idea why.

[Updated on: Sun, 18 December 2022 23:22]

Report message to a moderator

Re: Parser rule for floating points with scientific notation [message #1856624 is a reply to message #1856623] Mon, 19 December 2022 06:21 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14434
Registered: July 2009
Senior Member
I can't do this for you. Maybe time to play around with. Lexer replacement like jflex

Please also Noten that order of terminals
.. is keyword and thus terminal
Matters too


Need professional support for Xtext, Xpand, EMF?
Go to: https://www.itemis.com/en/it-services/methods-and-tools/xtext
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

[Updated on: Mon, 19 December 2022 06:33]

Report message to a moderator

Re: Parser rule for floating points with scientific notation [message #1856628 is a reply to message #1856624] Mon, 19 December 2022 13:49 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 7564
Registered: July 2009
Senior Member
Hi

It could well be that there is an ANTLR backtracking bug. Certainly I was a bit baffled as to why I couldn't get it working for OCL.

Switching to jflex might help since it's greedy regex should eat the floating point literal in exactly the same way as OCL's RetokenizingTokenSource does.

Switching to another underlying technology such as LPG has been on my to-investigate list for a long time; I expect to see a ten-fold improvement in parse speed and an ability for incremental update in the editor. I'm unclear how well an alternative will integrate. You will surely find that another technology's Token is different to Xtext/ANTLr's Token and so you will need to create an Xtext-compatible Token to wrap the jflex Token. Might well be a week's work, whereas the RetokenizingTokenSource is probably only a day's work.

Regards

Ed Willink
Re: Parser rule for floating points with scientific notation [message #1856631 is a reply to message #1856628] Mon, 19 December 2022 20:14 Go to previous message
Simon Cockx is currently offline Simon CockxFriend
Messages: 41
Registered: October 2021
Member
Ed, Christian

Thank you for the links and info. It has been really helpful to understand what's going on.

Since I'm not permitted to spent too much time on this, I will make a compromise and disallow using 'e' and 'E' as an identifier for now. Hopefully I can get back to this when I have the opportunity to dive deeper into lexer replacements.

Regards

Simon

[Updated on: Mon, 19 December 2022 23:20]

Report message to a moderator

Previous Topic:Xtend IDE 2.21 editor doesn't open and can't build workspace
Next Topic:Only allow explicit imports
Goto Forum:
  


Current Time: Sat Feb 04 15:59:35 GMT 2023

Powered by FUDForum. Page generated in 1.09810 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top