Hi,
I faced the problem that a VHDL lexer requires some semantic predicates.
(for details see: http://www.eda.org/isac/IRs-VHDL-93/IR1045.txt)
But I don't like to switch to a custom lexer. So I added some code to the generator fragments to tweak the antlr grammer with code taken from comments in the grammar.
TweakLexer.tweakLexer(absoluteLexerFileName,grammar,helper);
I add this to both fragments:
org.eclipse.xtext.generator.parser.antlr.ex.rt.AntlrGeneratorFragment
and
org.eclipse.xtext.generator.parser.antlr.ex.ca.ContentAssistParserGeneratorFragment
(I wish a I could add that by overloading and not with a full copy of these classes.) The TweakLexer file is attached.
So my xtext grammar contains:
/*@lexer::members {
private int lastSignificantTokenType;
public void emit(Token token) {
if( token.getChannel() == Token.DEFAULT_CHANNEL
&& token.getType() != RULE_SL_COMMENT
&& token.getType() != RULE_ML_COMMENT
&& token.getType() != RULE_WS
) {
lastSignificantTokenType = token.getType();
}
super.emit(token);
}
}*/
terminal CHARACTER_LITERAL : /*{
input.LA(3)=='\''
&& lastSignificantTokenType!=KEYWORD_")"
&& lastSignificantTokenType!=KEYWORD_"]"
&& lastSignificantTokenType!=KEYWORD_"all"
&& lastSignificantTokenType!=RULE_BASIC_IDENTIFIER
&& lastSignificantTokenType!=RULE_EXTENDED_IDENTIFIER
&& lastSignificantTokenType!=RULE_INTERNAL_IDENTIFIER
&& lastSignificantTokenType!=RULE_STRING
&& lastSignificantTokenType!=RULE_CHARACTER_LITERAL
&& lastSignificantTokenType!=RULE_BIT_STRING_LITERAL
}?=>*/ "'" (GRAPHIC_CHARACTER | '"' ) "'";
terminal TICK : /*{
lastSignificantTokenType==KEYWORD_")"
|| lastSignificantTokenType==KEYWORD_"]"
|| lastSignificantTokenType==KEYWORD_"all"
|| lastSignificantTokenType==RULE_BASIC_IDENTIFIER
|| lastSignificantTokenType==RULE_EXTENDED_IDENTIFIER
|| lastSignificantTokenType==RULE_INTERNAL_IDENTIFIER
|| lastSignificantTokenType==RULE_STRING
|| lastSignificantTokenType==RULE_CHARACTER_LITERAL
|| lastSignificantTokenType==RULE_BIT_STRING_LITERAL
}?=>*/ "'";
It requires not to use class splitting for the lexer, but it works pretty well.
I know it is a pretty ugly approach to parse comments. Maybe someone can suggest a better way doing this.
I hope this helps someone with a simmilar problem.