|
Re: Floats and qualified ids... [message #760208 is a reply to message #760181] |
Thu, 01 December 2011 15:44 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
In b3, I used the following (some b3 details omitted). The one thing I
had troubles supporting (b3 has a '.' operator as well as a '..'
operator) was to allow a float written without a leading integer (e.g.
..5). The grammar below requires an integral part for float.
Note that FLOAT and QID are datatype rules for which you want
appropriate converters.
----
// (Does not use default terminals)
// Qualified name
QID hidden() :
ID_or_KW
(INT|HEX|ID_or_KW)*
('.' ID_or_KW (INT|HEX|ID_or_KW)*)* ;
ID_or_KW : ID | KW ;
// some keywords that are always valid identifiers
KW : "kw1" | ... | "kwN" ;
FLOAT hidden(): INT '.' (EXT_INT | INT);
terminal ID :
('^')?
(('a'..'z')|('A'..'Z')|'_') (('a'..'z')|('A'..'Z')|('0'..'9')|'_')*
;
terminal HEX : '0' ('x'|'X')(('0'..'9')|('a'..'f')|('A'..'F'))+ ;
terminal INT : ('0'..'9')+;
terminal EXT_INT: INT ('e'|'E')('-'|'+') INT;
(The full grammar can be found at:
https://github.com/eclipse/b3/blob/master/org.eclipse.b3.beelang/src/org/eclipse/b3/BeeLang.xtext)
If this is the only lexical difficulty you are facing, you may be happy
with something like the above. OTOH I found that using an external lexer
is of great value as it reduces grammar complexity. I used that approach
in (cloudsmith/geppetto @github), but that language does not have floats
so it does not immediately show how you could solve those. But it is a
good example for more advanced lexing (simplified string and template
string processing, only recognizing certain tokens if appearing after
certain other tokens, etc.). Using an external lexer requires a bit of
work (and I would not go there until my language was reasonably stable -
changing terminals is not fun).
Hope that is of some help.
- henrik
On 2011-01-12 15:51, Vlad Dumitrescu wrote:
> Hi again,
>
> I have a language where I have float numbers (also in engineering
> format) and also qualified identifiers that use the dot as separator. On
> top of that, the dot is also used as an ending token for other
> constructs (so that you can have "2.3." as "float dot" or "2." as
> "integer dot").
>
> I tried a lot of variants, both of my own device and found online, but I
> can't make it work with all these constructs.
> I would suppose that these issues are quite often needed (at least the
> first two above) and that one wouldn't need to reinvent the wheel every
> time. Most of the answers to similar questions are "This is difficult",
> which doesn't really help... Where can I get some inspiration?
>
> best regards,
> Vlad
>
|
|
|
|
Re: Floats and qualified ids... [message #760335 is a reply to message #760230] |
Thu, 01 December 2011 22:43 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
On 2011-01-12 17:24, Vlad Dumitrescu wrote:
> Thank you very much Henrik, it helps!
> Is there any documentation on how to plug in an external lexer? My
> language is an existing one and thus is stable, so it may be just as
> well to reuse the existing lexer.
Documentation? Not really, IIRC I got help in the forum. Basically,
support for an external lexer written with ANTLR is supported out of the
box - so except for writing the lexer it is quite straight forward.
You can look at cloudsmith/geppetto @ github - the project
/org.cloudsmith.geppetto.pp.dsl contains the grammar and the lexer.
Basically, you configure this in your mwe file, write an anltr '.g' file
containing the lexer, and then perhaps (as I did) dress it up with a bit
of supporting code (require to keep track of last seen token IIRC). In
case you wonder, you must keep all the keywords and terminals as is in
your grammar - the terminal rules will not be used, but I think it is a
good thing if they at least roughly represent what they are in the
external lexer (you can look at how I did this in the pp.xtext grammar).
I am sure you can figure out how it works in geppetto by a) looking at
the mwe file, and b) looking at the module where the lexer is overridden
with a PPLexer and start drilling down from there.
The lexer in antlr grammar is in
org.cloudsmith.geppetto.pp.dsl.lexer.PPLexer.g
(https://github.com/cloudsmith/geppetto/blob/master/org.cloudsmith.geppetto.pp.dsl/src/org/cloudsmith/geppetto/pp/dsl/lexer/PPLexer.g)
Naturally, you need to understand how to write things using antlr - but
the code in geppetto is not particularly complicated, so you can
probably figure out. The trickiest part was to sync the tokens (if you
add tokens/keywords they will get new numeric values).
Happy to answer questions about how the Geppetto implementation works.
Hope that is of help.
Regards
- henrik
|
|
|
|
Powered by
FUDForum. Page generated in 0.03523 seconds