Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Floats and qualified ids...
Floats and qualified ids... [message #760181] Thu, 01 December 2011 14:51 Go to next message
Vlad Dumitrescu is currently offline Vlad DumitrescuFriend
Messages: 431
Registered: July 2009
Location: Gothenburg
Senior Member
Hi again,

I have a language where I have float numbers (also in engineering format) and also qualified identifiers that use the dot as separator. On top of that, the dot is also used as an ending token for other constructs (so that you can have "2.3." as "float dot" or "2." as "integer dot").

I tried a lot of variants, both of my own device and found online, but I can't make it work with all these constructs.

I would suppose that these issues are quite often needed (at least the first two above) and that one wouldn't need to reinvent the wheel every time. Most of the answers to similar questions are "This is difficult", which doesn't really help... Where can I get some inspiration?

best regards,
Vlad
Re: Floats and qualified ids... [message #760208 is a reply to message #760181] Thu, 01 December 2011 15:44 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
In b3, I used the following (some b3 details omitted). The one thing I
had troubles supporting (b3 has a '.' operator as well as a '..'
operator) was to allow a float written without a leading integer (e.g.
..5). The grammar below requires an integral part for float.

Note that FLOAT and QID are datatype rules for which you want
appropriate converters.

----
// (Does not use default terminals)
// Qualified name
QID hidden() :
ID_or_KW
(INT|HEX|ID_or_KW)*
('.' ID_or_KW (INT|HEX|ID_or_KW)*)* ;

ID_or_KW : ID | KW ;

// some keywords that are always valid identifiers
KW : "kw1" | ... | "kwN" ;

FLOAT hidden(): INT '.' (EXT_INT | INT);

terminal ID :
('^')?
(('a'..'z')|('A'..'Z')|'_') (('a'..'z')|('A'..'Z')|('0'..'9')|'_')*
;

terminal HEX : '0' ('x'|'X')(('0'..'9')|('a'..'f')|('A'..'F'))+ ;
terminal INT : ('0'..'9')+;
terminal EXT_INT: INT ('e'|'E')('-'|'+') INT;


(The full grammar can be found at:
https://github.com/eclipse/b3/blob/master/org.eclipse.b3.beelang/src/org/eclipse/b3/BeeLang.xtext)

If this is the only lexical difficulty you are facing, you may be happy
with something like the above. OTOH I found that using an external lexer
is of great value as it reduces grammar complexity. I used that approach
in (cloudsmith/geppetto @github), but that language does not have floats
so it does not immediately show how you could solve those. But it is a
good example for more advanced lexing (simplified string and template
string processing, only recognizing certain tokens if appearing after
certain other tokens, etc.). Using an external lexer requires a bit of
work (and I would not go there until my language was reasonably stable -
changing terminals is not fun).

Hope that is of some help.
- henrik

On 2011-01-12 15:51, Vlad Dumitrescu wrote:
> Hi again,
>
> I have a language where I have float numbers (also in engineering
> format) and also qualified identifiers that use the dot as separator. On
> top of that, the dot is also used as an ending token for other
> constructs (so that you can have "2.3." as "float dot" or "2." as
> "integer dot").
>
> I tried a lot of variants, both of my own device and found online, but I
> can't make it work with all these constructs.
> I would suppose that these issues are quite often needed (at least the
> first two above) and that one wouldn't need to reinvent the wheel every
> time. Most of the answers to similar questions are "This is difficult",
> which doesn't really help... Where can I get some inspiration?
>
> best regards,
> Vlad
>
Re: Floats and qualified ids... [message #760230 is a reply to message #760208] Thu, 01 December 2011 16:24 Go to previous messageGo to next message
Vlad Dumitrescu is currently offline Vlad DumitrescuFriend
Messages: 431
Registered: July 2009
Location: Gothenburg
Senior Member
Thank you very much Henrik, it helps!

Is there any documentation on how to plug in an external lexer? My language is an existing one and thus is stable, so it may be just as well to reuse the existing lexer.

regards,
Vlad
Re: Floats and qualified ids... [message #760335 is a reply to message #760230] Thu, 01 December 2011 22:43 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2509
Registered: July 2009
Senior Member
On 2011-01-12 17:24, Vlad Dumitrescu wrote:
> Thank you very much Henrik, it helps!
> Is there any documentation on how to plug in an external lexer? My
> language is an existing one and thus is stable, so it may be just as
> well to reuse the existing lexer.

Documentation? Not really, IIRC I got help in the forum. Basically,
support for an external lexer written with ANTLR is supported out of the
box - so except for writing the lexer it is quite straight forward.

You can look at cloudsmith/geppetto @ github - the project
/org.cloudsmith.geppetto.pp.dsl contains the grammar and the lexer.
Basically, you configure this in your mwe file, write an anltr '.g' file
containing the lexer, and then perhaps (as I did) dress it up with a bit
of supporting code (require to keep track of last seen token IIRC). In
case you wonder, you must keep all the keywords and terminals as is in
your grammar - the terminal rules will not be used, but I think it is a
good thing if they at least roughly represent what they are in the
external lexer (you can look at how I did this in the pp.xtext grammar).

I am sure you can figure out how it works in geppetto by a) looking at
the mwe file, and b) looking at the module where the lexer is overridden
with a PPLexer and start drilling down from there.

The lexer in antlr grammar is in
org.cloudsmith.geppetto.pp.dsl.lexer.PPLexer.g
(https://github.com/cloudsmith/geppetto/blob/master/org.cloudsmith.geppetto.pp.dsl/src/org/cloudsmith/geppetto/pp/dsl/lexer/PPLexer.g)


Naturally, you need to understand how to write things using antlr - but
the code in geppetto is not particularly complicated, so you can
probably figure out. The trickiest part was to sync the tokens (if you
add tokens/keywords they will get new numeric values).

Happy to answer questions about how the Geppetto implementation works.

Hope that is of help.
Regards
- henrik
Re: Floats and qualified ids... [message #760362 is a reply to message #760335] Fri, 02 December 2011 07:35 Go to previous message
Vlad Dumitrescu is currently offline Vlad DumitrescuFriend
Messages: 431
Registered: July 2009
Location: Gothenburg
Senior Member
Thank you very much again for the detailed answer. I will dig into it.

best regards,
Vlad
Previous Topic:Terminal confusion
Next Topic:selection of a rule's content
Goto Forum:
  


Current Time: Wed Apr 24 22:16:31 GMT 2024

Powered by FUDForum. Page generated in 0.03523 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top