Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Troubles With Parsing Numbers
Troubles With Parsing Numbers [message #1753191] Thu, 02 February 2017 15:35 Go to next message
Brandon Lewis is currently offline Brandon LewisFriend
Messages: 71
Registered: May 2012
Member
I've taken a shot at making a DSL for a language specified in IEEE 1687. The language is called ICL and I was interested to see that the specification provides an ANTLR4 grammar.

The grammar is rather large and I've been able to hack it enough to make it through the Xtext flows. To do this, I've had to enable backtracking (but I do not yet understand why). There's a big problem that almost not AST EMF models get inferred, but I'm taking baby steps.

Even after hacking the grammar and getting through the flows, my first immediate problem that I can't figure out how to solve is parsing numbers. Hardware nerds typically use binary, decimal, and hexadecimal number formats (octal is right out). There are a number of hardware languages that use these standard formats, so figuring it out potentially has some lasting utility for me.

So numbers appear like: 1'b0, or 2'd32 or 32'h8000_1ABF or 198

The numbering section of the grammar looks like this:

pos_int : '0' | '1' | POS_INT ;
POS_INT : DEC_DIGIT('_'|DEC_DIGIT)* ;
size : pos_int | '$' SCALAR_ID ;
UNKNOWN_DIGIT : 'X' | 'x';
DEC_DIGIT : '0'..'9' ;
BIN_DIGIT : '0'..'1' | UNKNOWN_DIGIT ;
HEX_DIGIT : '0'..'9' | 'A'..'F' | 'a'..'f' | UNKNOWN_DIGIT ;
DEC_BASE : '\'' ('d' | 'D') (' ' | '\t')*;
BIN_BASE : '\'' ('b' | 'B') (' ' | '\t')*;
HEX_BASE : '\'' ('h' | 'H') (' ' | '\t')*;
UNSIZED_DEC_NUM : DEC_BASE POS_INT ;
UNSIZED_BIN_NUM : BIN_BASE BIN_DIGIT('_'|BIN_DIGIT)* ;
UNSIZED_HEX_NUM : HEX_BASE HEX_DIGIT('_'|HEX_DIGIT)* ;
sized_dec_num : size UNSIZED_DEC_NUM ;
sized_bin_num : size UNSIZED_BIN_NUM ;
sized_hex_num : size UNSIZED_HEX_NUM ;
vector_id : SCALAR_ID '[' (index | range) ']' ;
index : integer_expr ;
range : index ':' index ;

I have converted this to something that parses in the Xtext editor (which involves converting many of them to terminals):

POS_INT : DEC_DIGIT('_'|DEC_DIGIT)* ;
pos_int_lower : POS_INT ;
size : pos_int_lower | '$' SCALAR_ID ;
UNSIZED_DEC_NUM : DEC_BASE POS_INT ;
UNSIZED_BIN_NUM : BIN_BASE BIN_DIGIT('_'|BIN_DIGIT)* ;
UNSIZED_HEX_NUM : HEX_BASE HEX_DIGIT('_'|HEX_DIGIT)* ;
sized_dec_num : size UNSIZED_DEC_NUM ;
sized_bin_num : size UNSIZED_BIN_NUM ;
sized_hex_num : size UNSIZED_HEX_NUM ;
vector_id : SCALAR_ID '[' (range | index)']' ;
//index : integer_expr ;
// making it simple for now until I understand expressions and how the refactoring works
index : pos_int_lower;
range : index ':' index ;
terminal X_DIGIT : 'X' | 'x';
terminal BIN_DIGIT : '0'..'1' | X_DIGIT;
terminal HEX_DIGIT : DEC_DIGIT | 'A'..'F' | 'a'..'f' | X_DIGIT ;
terminal DEC_BASE : '\'' ('d' | 'D') (' ' | '\t')*;
terminal BIN_BASE : '\'' ('b' | 'B') (' ' | '\t')*;
terminal HEX_BASE : '\'' ('h' | 'H') (' ' | '\t')*;
terminal SCALAR_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

I'm sure you already know where this is going and I'm hoping that an Xtext expert would see this problem as easy. I don't yet.

But a line like this (which involves other parts of the grammar not listed):

Alias sInstAddr = SBUS_cntl[3:23];

The right hand side is a vector_id

I'm just having a horrible time parsing the numbers. 0 and 1 appear to be almost keyworded (due to other parts of the grammar). Anything with more than one digit in it can't be parsed.

So SBUS_cntl[3:2] parses, but SBUS_cntl[13:2] doesn't (two digits).

Even SBUS_cntl[1:0] has the 1 and 0 highlighted as keywords, but the parser can't parse them.

I've started reading about syntatic predicates and I've opened the grammar in ANTLRworks, but I'm getting nowhere pretty fast.

Any help would be appreciated. This is just something we take for granted in everyday use in hardware languages so I find it interesting that it's this hard to do.
Re: Troubles With Parsing Numbers [message #1753193 is a reply to message #1753191] Thu, 02 February 2017 15:37 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 10574
Registered: July 2009
Senior Member
did you consider to use a external lexer e.g. based on jflex?
Re: Troubles With Parsing Numbers [message #1753194 is a reply to message #1753193] Thu, 02 February 2017 15:40 Go to previous messageGo to next message
Brandon Lewis is currently offline Brandon LewisFriend
Messages: 71
Registered: May 2012
Member
Thanks for the quick reply! Unfortunately no, I've never used external parsers before, so I don't know where to start.
Re: Troubles With Parsing Numbers [message #1753195 is a reply to message #1753194] Thu, 02 February 2017 15:51 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 5213
Registered: July 2009
Senior Member
Hi

An external lexer is a more powerful solution, but involves some research. For OCL, I had a relatively minor problem with decimal ".", navigation ".", and range ".." that I was able to solve by adjusting the Xtext 'source' tokens.

See GIT\org.eclipse.ocl\plugins\org.eclipse.ocl.xtext.base\src\org\eclipse\ocl\xtext\base\services\RetokenizingTokenSource.java

Regards

Ed Willink
Re: Troubles With Parsing Numbers [message #1753470 is a reply to message #1753195] Mon, 06 February 2017 20:52 Go to previous messageGo to next message
Brandon Lewis is currently offline Brandon LewisFriend
Messages: 71
Registered: May 2012
Member
Thanks for the example Ed. Dumb question: How do you get the lexer to use your TokenSource class?

RetokenizingTokenSource implements TokenSource

I searched the directory with your xtext grammar file expecting to find some lines in the MWE2 file giving a clue, but I didn't see one.
Re: Troubles With Parsing Numbers [message #1753475 is a reply to message #1753470] Mon, 06 February 2017 21:21 Go to previous messageGo to next message
Ed Willink is currently offline Ed WillinkFriend
Messages: 5213
Registered: July 2009
Senior Member
Hi

In for instance: GIT\org.eclipse.ocl\plugins\org.eclipse.ocl.xtext.oclinecore\src\org\eclipse\ocl\xtext\oclinecore\OCLinEcoreRuntimeModule.java

You will find:

@Override
public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
return RetokenizingOCLinEcoreParser.class;
}

public static class RetokenizingOCLinEcoreParser extends OCLinEcoreParser
{
@Override
protected XtextTokenStream createTokenStream(TokenSource tokenSource) {
return super.createTokenStream(new RetokenizingTokenSource(tokenSource, getTokenDefProvider().getTokenDefMap()));
}
}

Regards

Ed Willink
Re: Troubles With Parsing Numbers [message #1753594 is a reply to message #1753475] Wed, 08 February 2017 00:41 Go to previous message
Brandon Lewis is currently offline Brandon LewisFriend
Messages: 71
Registered: May 2012
Member
Thanks Ed! I think I'm at least calling my re-tokenizer now, although I haven't yet done anything interesting with it. Lot's to learn.
Previous Topic:Xtext 2.11 in Maven central
Next Topic:Remove unused import warning
Goto Forum:
  


Current Time: Sun May 28 00:54:26 GMT 2017

Powered by FUDForum. Page generated in 0.02615 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software