Matching braces in comments [message #729613] |
Mon, 26 September 2011 16:02 |
amey.par Messages: 17 Registered: July 2011 |
Junior Member |
|
|
I apologize for the ambiguous title, just couldn't think of a better one to describe my problem.
I'm developing support using Xtext for a language similar to C++/Java, and as such it supports declaring "structs", the grammar for which I've defined as
StructDeclaration:
'struct' '{'(declarations+=Declaration)* '}';
The language also supports comments structured as follows:
cpptext
{
function Something()
{
if (true)
{
}
}
}
Here, everything between the opening brace following "cpptext" up to the matching closing brace is to be ignored. However, this is complicated by the fact that there might be multiple nested opening and (matching) closing braces between the parent braces. So, a terminal rule like
CPP_TEXT: 'cpptext' -> '{' -> '}';
consumes everything from the first opening brace following "cpptext" till the first closing brace it encounters, and then throws parsing errors. I believe I need a nested rule like:
CPP_TEXT:
('cpptext'|'structcpptext') -> CPP_BLOCK;
terminal CPP_BLOCK:
'{' -> (CPP_BLOCK)* -> '}';
but the problem with this is that CPP_BLOCK then matches other (non-comment) braces in my model.
What should I do here?
Thanks,
Amey
[Updated on: Mon, 26 September 2011 16:24] Report message to a moderator
|
|
|
Re: Matching braces in comments [message #729710 is a reply to message #729613] |
Mon, 26 September 2011 19:53 |
Henrik Lindberg Messages: 2509 Registered: July 2009 |
Senior Member |
|
|
That will be a bit tricky if the content of the "structured comment" can
be invalid code. Otherwise, just treat it as a statement.
If it can contain anything - e.g
cpptext {
I am a comment with " and ' and [ ... and lots of other funny stuff
}
Then you need to solve this with an external lexer that counts opening
and closing braces.
Regards
- henrik
On 9/26/11 6:02 PM, amey.par wrote:
> I apologize for the ambiguous title, just couldn't think of a better one
> to describe my problem.
>
> I'm developing support using Xtext for a language similar to C++/Java,
> and as such it supports declaring "structs", the grammar for which I've
> defined as
>
> StructDeclaration:
> 'struct' '{'(declarations+=Declaration)* '}';
>
> The language also supports comments with the following syntax:
>
> cpptext
> {
> function Something()
> {
> if (true)
> {
> }
> }
> }
>
> Here, everything between the opening brace following "cpptext" up to the
> matching closing brace is to be ignored. However, this is complicated by
> the fact that there might be multiple nested opening and (matching)
> closing braces between the parent braces. So, a terminal rule like
>
> CPP_TEXT: 'cpptext' -> '{' -> '}';
>
> consumes everything from the first opening brace following "cpptext"
> till the first closing brace it encounters, and then throws parsing
> errors. I believe I need a nested rule like:
>
> CPP_TEXT:
> ('cpptext'|'structcpptext') -> CPP_BLOCK;
>
> terminal CPP_BLOCK:
> '{' -> (CPP_BLOCK)* -> '}';
>
> but the problem with this is that CPP_BLOCK then matches other
> (non-comment) braces in my model.
>
> What should I do here?
>
> Thanks,
> Amey
|
|
|
|
|
|
|
Re: Matching braces in comments [message #747904 is a reply to message #730833] |
Mon, 24 October 2011 10:56 |
amey.par Messages: 17 Registered: July 2011 |
Junior Member |
|
|
Daniel wrote on Thu, 29 September 2011 06:17The easiest way would be to simply change the start and end braces of your cpp text like xtend does it with ''' ''' for RichStrings.
True, but sadly I'm not developing my own language, just making an Eclipse plugin for an existing language: http://udn.epicgames.com/Three/UnrealScriptReference.html
Anyhow, I've succeeded in extend the lexer just to match my comments, and true to my word, here I am posting how I did it in case someone else trying to solve a similar issue stumbles across this thread.
In my runtime module:
public Class<? extends org.eclipse.xtext.parser.antlr.Lexer> bindLexer() {
return com.wirywolf.parser.lexer.UnrealscriptLexer.class;
}
public void configureRuntimeLexer(com.google.inject.Binder binder) {
binder.bind(org.eclipse.xtext.parser.antlr.Lexer.class)
.annotatedWith(com.google.inject.name.Names
.named(org.eclipse.xtext.parser.antlr.LexerBindings.RUNTIME))
.to(com.wirywolf.parser.lexer.UnrealscriptLexer.class);
}
and the new class I added, UnrealscriptLexer.java:
public class UnrealscriptLexer extends InternalUnrealscriptLexer
{
// dfa3, dfa19 are auto-generated protected members in the lexer
protected DFA my_dfa3, my_dfa19;
public UnrealscriptLexer()
{
super();
my_dfa3 = this.dfa3;
my_dfa19 = this.dfa19;
}
public UnrealscriptLexer(CharStream input)
{
super(input);
my_dfa3 = this.dfa3;
my_dfa19 = this.dfa19;
}
public UnrealscriptLexer(CharStream input, RecognizerSharedState state)
{
super(input,state);
my_dfa3 = this.dfa3;
my_dfa19 = this.dfa19;
}
public void mTokens() throws RecognitionException
{
// The generated lexer has my custom comment rule mapped to 158
if (my_dfa19.predict(input) == 158)
Custom_mRULE_CPP_TEXT();
else
super.mTokens();
}
public final void Custom_mRULE_CPP_TEXT() throws RecognitionException
{
try
{
int _type = RULE_CPP_TEXT;
int _channel = DEFAULT_TOKEN_CHANNEL;
{
int alt2=2;
int LA2_0 = input.LA(1);
if ( (LA2_0=='c') )
{
alt2=1;
}
else if ( (LA2_0=='s') )
{
alt2=2;
}
else
{
NoViableAltException nvae =
new NoViableAltException("", 2, 0, input);
throw nvae;
}
switch (alt2)
{
case 1 :
match("cpptext");
break;
case 2 :
match("structcpptext");
break;
}
loop3:
do
{
int alt3=2;
alt3 = my_dfa3.predict(input);
switch (alt3)
{
case 1 :
// ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:73: .
{
matchAny();
}
break;
default :
break loop3;
}
} while (true);
match('{');
// ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:81: ( options {greedy=false; } : . )*
int open_braces = 0;
loop4:
do {
int alt4=2;
int LA4_0 = input.LA(1);
if (LA4_0=='}')
{
if (open_braces > 0)
{
open_braces -= 1;
alt4 = 1;
}
else
{
alt4 = 2;
}
}
else if (LA4_0 == '{')
{
open_braces += 1;
alt4 = 1;
}
else if ( ((LA4_0>='\u0000' && LA4_0<='|')||(LA4_0>='~' && LA4_0<='\uFFFF')) ) {
alt4=1;
}
switch (alt4) {
case 1 :
// ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:109: .
{
matchAny();
}
break;
default :
break loop4;
}
} while (true);
match('}');
}
state.type = _type;
state.channel = _channel;
}
finally {
}
}
}
|
|
|
Powered by
FUDForum. Page generated in 5.05140 seconds