|
|
|
|
|
|
|
|
Re: how to deal with a non context free lexing issue [message #644187 is a reply to message #644103] |
Thu, 09 December 2010 19:26 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
Thanks for the info, I don't know if my rule for turning on/off is as
simple as "following and ID", but certainly worth looking into.
I have not looked deep into how to manually override something like this
in the lexer (as I took a different approach) - is your code available
somewhere?
Regards
- henrik
On 12/9/10 3:10 PM, Jonathan wrote:
> Hello,
>
> I had the same problem than you. To solve it, just turned off regexp
> recognition when there cannot be any (after an ID for instance). I did
> that in my lexer, manually (I inherit from the InternalLexer in
> src-gen). This is the solution proposed in ES3.g grammar, on the ANTLR
> website.
> Consequently, my regular expressions are scanned by the lexer, with this
> rule :
>
> terminal REGEXP:
> '/'
> (
> !('/'|'\\'|'\n'|'\r'|'*') | ('\\' !('\n'|'\r'))
> )
> (
> !('/'|'\\'|'\n'|'\r') | ('\\' !('\n'|'\r'))
> )*
> '/'
> ('a'..'z' | 'A'..'Z' | '0'..'9')*
> ;
>
> Cheers,
>
> Jonathan
|
|
|
Re: how to deal with a non context free lexing issue [message #647108 is a reply to message #644187] |
Tue, 04 January 2011 16:12 |
Jonathan Messages: 6 Registered: December 2010 |
Junior Member |
|
|
The code is huge because I have to copy the whole mTokens() method from the lexer, as rule-specific methods are marked 'final'. To summarize, I created a subclass of the internal lexer (myPackage.parser.antlr.internal.InternalMyDslLexer.java) and I wrote :
public MyDslLexer() {
super();
}
public MyDslLexer(CharStream input) {
super(input);
}
private Token last;
private final boolean areRegexEnabled() {
if (last == null)
return true;
switch (last.getType()) {
// identifier
case RULE_ID:
// literals
case T45: // 'this'
case T89: // 'true'
case T90: // 'false'
case T91: // 'null'
case RULE_NUMBER:
case RULE_HEX_NUMBER:
case RULE_STRING:
// member access ending
case T21: // ']'
// function call or nested expression ending
case T15: // ')'
return false;
// otherwise OK
default:
return true;
}
}
@Override
public Token nextToken() {
Token result = super.nextToken();
if (!isHiddenToken(result))
last = result;
return result;
}
public boolean isHiddenToken(Token t) {
int type = t.getType();
return type == RULE_WS | type == RULE_ML_COMMENT
| type == RULE_SL_COMMENT;
}
The constraint is to know which number identifies which token (this can be found in myPackage.parser.antlr.internal.InternalMyDsl.tokens). So this hack must be updated at each grammar modification...
Then, you override mTokens(). Every time the ambiguous rule is called, use the methods above to decide which rule to call. I wrote :
if (areRegexEnabled())
mRULE_REGEX(); // rule for regular expressions
else
mT69(); // Rule for divide operator '/'
I know this is not a proper way to do that, but it works...
When I have more time, I will write an Xpand postprocessor to modify the generated lexer automatically, as suggested in another post. So I would just have to change mRULE_REGEX() instead of mTokens().
Cheers,
[Updated on: Tue, 04 January 2011 16:14] Report message to a moderator
|
|
|
Powered by
FUDForum. Page generated in 0.04912 seconds