Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » automatic tweaking lexer for semantic predicates
automatic tweaking lexer for semantic predicates [message #1073856] Thu, 25 July 2013 12:24
Jens Kuenzer is currently offline Jens Kuenzer
Messages: 25
Registered: October 2009
Junior Member
Hi,

I faced the problem that a VHDL lexer requires some semantic predicates.
(for details see: http://www.eda.org/isac/IRs-VHDL-93/IR1045.txt)

But I don't like to switch to a custom lexer. So I added some code to the generator fragments to tweak the antlr grammer with code taken from comments in the grammar.
TweakLexer.tweakLexer(absoluteLexerFileName,grammar,helper);


I add this to both fragments:

org.eclipse.xtext.generator.parser.antlr.ex.rt.AntlrGeneratorFragment
and
org.eclipse.xtext.generator.parser.antlr.ex.ca.ContentAssistParserGeneratorFragment

(I wish a I could add that by overloading and not with a full copy of these classes.) The TweakLexer file is attached.

So my xtext grammar contains:
/*@lexer::members {

    private int lastSignificantTokenType;

    public void emit(Token token) {
        if(  token.getChannel() == Token.DEFAULT_CHANNEL
          && token.getType() != RULE_SL_COMMENT
          && token.getType() != RULE_ML_COMMENT
          && token.getType() != RULE_WS
          ) {
            lastSignificantTokenType = token.getType();
        }
        super.emit(token);
    }

}*/

terminal CHARACTER_LITERAL : /*{
         input.LA(3)=='\''  
	  && lastSignificantTokenType!=KEYWORD_")"
      && lastSignificantTokenType!=KEYWORD_"]"
      && lastSignificantTokenType!=KEYWORD_"all"
      && lastSignificantTokenType!=RULE_BASIC_IDENTIFIER
      && lastSignificantTokenType!=RULE_EXTENDED_IDENTIFIER
      && lastSignificantTokenType!=RULE_INTERNAL_IDENTIFIER
      && lastSignificantTokenType!=RULE_STRING
      && lastSignificantTokenType!=RULE_CHARACTER_LITERAL
      && lastSignificantTokenType!=RULE_BIT_STRING_LITERAL
       }?=>*/ "'" (GRAPHIC_CHARACTER | '"' ) "'";

terminal TICK : /*{
         lastSignificantTokenType==KEYWORD_")"
      || lastSignificantTokenType==KEYWORD_"]"
      || lastSignificantTokenType==KEYWORD_"all"
      || lastSignificantTokenType==RULE_BASIC_IDENTIFIER
      || lastSignificantTokenType==RULE_EXTENDED_IDENTIFIER
      || lastSignificantTokenType==RULE_INTERNAL_IDENTIFIER
      || lastSignificantTokenType==RULE_STRING
      || lastSignificantTokenType==RULE_CHARACTER_LITERAL
      || lastSignificantTokenType==RULE_BIT_STRING_LITERAL
      }?=>*/ "'";


It requires not to use class splitting for the lexer, but it works pretty well.
I know it is a pretty ugly approach to parse comments. Maybe someone can suggest a better way doing this.

I hope this helps someone with a simmilar problem.
Previous Topic:A simple error break a whole file: how to avoid?
Next Topic:quick fix in xtext
Goto Forum:
  


Current Time: Sat Aug 23 13:28:25 EDT 2014

Powered by FUDForum. Page generated in 0.01512 seconds