Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Lexer problem
Lexer problem [message #1722766] Mon, 08 February 2016 23:03 Go to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
Hello,

I have created a new Literal that takes a number and a unit of measurement.
It works well with a prefix token, but I would like to make it work without, since the expected units are predefined.

The thing is that it does work, but it prevents the DECIMAL terminal from being used. There might be a way to set options etc. to accomplish that goal. So I am open to any suggestion.

terminal MEASURE:
	INT ('.' INT)? ('in'|'mm'|'cm'|'dm'|'m'|'in2'|'mm2'|'cm2'|'dm2'|'m2'|'in3'|'mm3'|'cm3'|'dm3'|'m3'|'lb'|'mg'|'g'|'kg');


I tried other variations like

terminal MEASURE_TERMINAL:
	(INT '.')? INT ('in'|'mm'|'cm'|'dm'|'m'|'in2'|'mm2'|'cm2'|'dm2'|'m2'|'in3'|'mm3'|'cm3'|'dm3'|'m3'|'lb'|'mg'|'g'|'kg');


But nothing worked yet.

Examples of strings that parse ok:

val length = 9392.3923mm
val weight = 100kg
val l = 1932

but

val d = 2392.99 // THAT fails. DECIMAL is not matched


thanks!

Re: Lexer problem [message #1722777 is a reply to message #1722766] Tue, 09 February 2016 05:43 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
Have a look at the concept of data type rules

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722797 is a reply to message #1722766] Tue, 09 February 2016 08:59 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
Terminal rules are processed by the lexer which splits the character stream into tokens (keywords and terminals). Lexing is completely independent of parsing and just uses its own very simple lookahead strategy.

Datatype rules are processed by the parser. The parser uses a clever lookahead strategy to match its rules. As opposed to plain parser rules, datatype rules return a value rather than an Object. The types for these values come from Ecore, so you have to import Ecore when you need value types other than EString.

I'd also recommend to move the units into a separate enum rule

import 'http://www.eclipse.org/emf/2002/Ecore' as ecore
...
Measure:
value=DoubleValue unit=Unit;

DoubleValue returns ecore::EDouble:
'-'? INT ('.' INT)?;

enum Unit:
'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg');

For a more sophisticated double rule including exponential notation etc. have a look at the Xbase grammar. Note that the '.' and the '-' are usually quite overloaded in most languages, so it may not be possible to cover all possible notations without introducing ambiguities.


Re: Lexer problem [message #1722833 is a reply to message #1722797] Tue, 09 February 2016 14:15 Go to previous messageGo to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
The suggestion is interesting, but cannot work in practice. Because my units are very common, the parser gets confused in many places like

val m = 293 mm

m is in the Unit enum, which is definitely not good.

That's why I was using terminals. to force the input to be tokenized before the parser sees it. I'm happy writing 100mm or 293.293kg.

terminal MT: DOUBLEVALUE UNIT;

terminal DOUBLEVALUE returns ecore::EDouble:
'-'? INT ('.' INT)?;

terminal UNIT: 
'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg';


Does the same thing as my previous rules.

Is there any way to force a longer lookahead for the tokenizer ?

Thanks!

[Updated on: Tue, 09 February 2016 14:16]

Report message to a moderator

Re: Lexer problem [message #1722834 is a reply to message #1722833] Tue, 09 February 2016 14:21 Go to previous messageGo to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
I thought this might work... but still no cigar

terminal MT returns ecore::EString:
'-'? INT ('.' INT)? UNIT;
Re: Lexer problem [message #1722837 is a reply to message #1722834] Tue, 09 February 2016 14:30 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
as saif before. dont use terminals. use datatype rules

MT returns ecore::EString:
'-'? INT ('.' INT)? UNIT;


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722838 is a reply to message #1722834] Tue, 09 February 2016 14:31 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
The parser just sees a keyword and matches it in an enum rule if that's possible in the specific context. If 'm' is a valid identifier as well, you should extend your ID rule to also accept the respective keywords:

MyID:
ID | 'm' | 'mm' | ...

VariableDeclaration:
'var' name=MyID '=' value=Measure

This is the same as we did in Xbase, where e.g. 'import' is a keyword and a valid variable name. Adapt semantic highlighting if you want identifiers to always be displayed the same, regardless whether they are a keyword or not.

Terminal rules are too dumb to solve your problem. You'll definitely run into ambiguity issues such as
'-' operator vs '-' sign
'.' as feature call, namespace delimiter or decimal point
etc. which can only be decided by the parser.



---
Get professional support from the Xtext committers at www.typefox.io
Re: Lexer problem [message #1722839 is a reply to message #1722838] Tue, 09 February 2016 14:33 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
(in xbase MyID is called ValidID by default

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722848 is a reply to message #1722839] Tue, 09 February 2016 15:21 Go to previous messageGo to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
I have tried many variations, but it's not working out. So I'll stick to using a token in front of my literal to make it clear to the parser.
Thanks for your help Smile
Re: Lexer problem [message #1722865 is a reply to message #1722848] Tue, 09 February 2016 19:32 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
using

XExpressionOrVarDeclaration returns xbase::XExpression:
	=>XDistanceLiteral | super;	

XDistanceLiteral:
	=>value=Measure
;	

Measure:
	=>(INT UNIT)
;

ValidID:
	ID | 'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg'
;



 UNIT: 
'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg';


seem to work for me


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722866 is a reply to message #1722865] Tue, 09 February 2016 19:54 Go to previous messageGo to next message
Jan Koehnlein is currently offline Jan KoehnleinFriend
Messages: 760
Registered: July 2009
Location: Hamburg
Senior Member
Nice to hear you managed it. I recommend to write some parser tests to make sure your syntactic predicates don't cover some other rule. At the first glance, the predicates in XDistanceLiteral and Measure look obsolete, but I may be wrong here.

---
Get professional support from the Xtext committers at www.typefox.io
Re: Lexer problem [message #1722868 is a reply to message #1722866] Tue, 09 February 2016 20:01 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
the problem is
that

1 mm
could be

var mm = "Hello"
1
mm

but yes


XExpressionOrVarDeclaration returns xbase::XExpression:
=>XDistanceLiteral | super;

XDistanceLiteral:
=>value=Measure
;

Measure:
=>(INT UNIT)
;

ValidID:
ID | 'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg'
;



UNIT:
'in'|'mm'|'cm'|'dm'|'m'
|'in2'|'mm2'|'cm2'|'dm2'|'m2'
|'in3'|'mm3'|'cm3'|'dm3'|'m3'
|'lb'|'mg'|'g'|'kg';

seems fine as well


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722869 is a reply to message #1722868] Tue, 09 February 2016 20:21 Go to previous messageGo to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
Thanks guys, I'll run some tests with your new suggestions.
Re: Lexer problem [message #1722871 is a reply to message #1722869] Tue, 09 February 2016 20:27 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
another possibility would be to have no literal at all but api (e.g. via extension class)

to demonstrate have a look at these lines of xtend code

class Demo {
	
	def static void main(String[] args) {
		var m = 1.mm
		var n = 2.mm
		var result = m + n
	}
	
	def static mm(int n) {
		return new MM(n)
	}
	
	def static operator_plus(MM a, MM b) {
		return new MM(a.value + b.value)
	}
	
}

@Data
class MM {
	int value
}


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722873 is a reply to message #1722871] Tue, 09 February 2016 20:55 Go to previous messageGo to next message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
I am already using the extensions for the operations Smile

I'm working on implicit conversion of Measure to Double and I'm stuck with something silly.

How do I get a LightWeightReference in the context of an extension to SynonymTypesProvider ?

something like this, but with the right JvmType lookup!

class ModelDslSynonymTypesProvider extends SynonymTypesProvider {
	@Inject extension LightweightTypeReferenceFactory 

	override protected boolean collectCustomSynonymTypes(LightweightTypeReference type, Acceptor acceptor) {
		if(type.invariantBoundSubstitute.isType(Measure)) {
			val lwDouble = toLightweightReference( .....  findTypeByName("java.lang.Double"))
			return announceSynonym(lwDouble, ConformanceFlags.DEMAND_CONVERSION, acceptor);
		}

		return true
	}}


I'm blocked because the normal extensions I use require a Notifier, but I am in a stateless context right now... I think...

Thanks !
Re: Lexer problem [message #1722874 is a reply to message #1722873] Tue, 09 February 2016 21:04 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
// this is java code
ITypeReferenceOwner owner = type.getOwner();
JvmType myType = owner.getServices().getTypeReferences().findDeclaredType(MyType.class, owner.getContextResourceSet());


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Lexer problem [message #1722875 is a reply to message #1722874] Tue, 09 February 2016 21:06 Go to previous message
Daniel Cardin is currently offline Daniel CardinFriend
Messages: 109
Registered: July 2009
Senior Member
Oh wow. Pretty fancy. But once you got the recipe...

Thanks again, Christian!
Previous Topic:Auto convert keywords to uppercase while typing
Next Topic:comment/uncomment handlers
Goto Forum:
  


Current Time: Fri Mar 29 12:29:51 GMT 2024

Powered by FUDForum. Page generated in 0.02810 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top