Troubles With Parsing Numbers [message #1753191] |
Thu, 02 February 2017 15:35 |
Brandon Lewis Messages: 268 Registered: May 2012 |
Senior Member |
|
|
I've taken a shot at making a DSL for a language specified in IEEE 1687. The language is called ICL and I was interested to see that the specification provides an ANTLR4 grammar.
The grammar is rather large and I've been able to hack it enough to make it through the Xtext flows. To do this, I've had to enable backtracking (but I do not yet understand why). There's a big problem that almost not AST EMF models get inferred, but I'm taking baby steps.
Even after hacking the grammar and getting through the flows, my first immediate problem that I can't figure out how to solve is parsing numbers. Hardware nerds typically use binary, decimal, and hexadecimal number formats (octal is right out). There are a number of hardware languages that use these standard formats, so figuring it out potentially has some lasting utility for me.
So numbers appear like: 1'b0, or 2'd32 or 32'h8000_1ABF or 198
The numbering section of the grammar looks like this:
pos_int : '0' | '1' | POS_INT ;
POS_INT : DEC_DIGIT('_'|DEC_DIGIT)* ;
size : pos_int | '$' SCALAR_ID ;
UNKNOWN_DIGIT : 'X' | 'x';
DEC_DIGIT : '0'..'9' ;
BIN_DIGIT : '0'..'1' | UNKNOWN_DIGIT ;
HEX_DIGIT : '0'..'9' | 'A'..'F' | 'a'..'f' | UNKNOWN_DIGIT ;
DEC_BASE : '\'' ('d' | 'D') (' ' | '\t')*;
BIN_BASE : '\'' ('b' | 'B') (' ' | '\t')*;
HEX_BASE : '\'' ('h' | 'H') (' ' | '\t')*;
UNSIZED_DEC_NUM : DEC_BASE POS_INT ;
UNSIZED_BIN_NUM : BIN_BASE BIN_DIGIT('_'|BIN_DIGIT)* ;
UNSIZED_HEX_NUM : HEX_BASE HEX_DIGIT('_'|HEX_DIGIT)* ;
sized_dec_num : size UNSIZED_DEC_NUM ;
sized_bin_num : size UNSIZED_BIN_NUM ;
sized_hex_num : size UNSIZED_HEX_NUM ;
vector_id : SCALAR_ID '[' (index | range) ']' ;
index : integer_expr ;
range : index ':' index ;
I have converted this to something that parses in the Xtext editor (which involves converting many of them to terminals):
POS_INT : DEC_DIGIT('_'|DEC_DIGIT)* ;
pos_int_lower : POS_INT ;
size : pos_int_lower | '$' SCALAR_ID ;
UNSIZED_DEC_NUM : DEC_BASE POS_INT ;
UNSIZED_BIN_NUM : BIN_BASE BIN_DIGIT('_'|BIN_DIGIT)* ;
UNSIZED_HEX_NUM : HEX_BASE HEX_DIGIT('_'|HEX_DIGIT)* ;
sized_dec_num : size UNSIZED_DEC_NUM ;
sized_bin_num : size UNSIZED_BIN_NUM ;
sized_hex_num : size UNSIZED_HEX_NUM ;
vector_id : SCALAR_ID '[' (range | index)']' ;
//index : integer_expr ;
// making it simple for now until I understand expressions and how the refactoring works
index : pos_int_lower;
range : index ':' index ;
terminal X_DIGIT : 'X' | 'x';
terminal BIN_DIGIT : '0'..'1' | X_DIGIT;
terminal HEX_DIGIT : DEC_DIGIT | 'A'..'F' | 'a'..'f' | X_DIGIT ;
terminal DEC_BASE : '\'' ('d' | 'D') (' ' | '\t')*;
terminal BIN_BASE : '\'' ('b' | 'B') (' ' | '\t')*;
terminal HEX_BASE : '\'' ('h' | 'H') (' ' | '\t')*;
terminal SCALAR_ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
I'm sure you already know where this is going and I'm hoping that an Xtext expert would see this problem as easy. I don't yet.
But a line like this (which involves other parts of the grammar not listed):
Alias sInstAddr = SBUS_cntl[3:23];
The right hand side is a vector_id
I'm just having a horrible time parsing the numbers. 0 and 1 appear to be almost keyworded (due to other parts of the grammar). Anything with more than one digit in it can't be parsed.
So SBUS_cntl[3:2] parses, but SBUS_cntl[13:2] doesn't (two digits).
Even SBUS_cntl[1:0] has the 1 and 0 highlighted as keywords, but the parser can't parse them.
I've started reading about syntatic predicates and I've opened the grammar in ANTLRworks, but I'm getting nowhere pretty fast.
Any help would be appreciated. This is just something we take for granted in everyday use in hardware languages so I find it interesting that it's this hard to do.
|
|
|
|
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.03859 seconds