Eclipse Community Forums: TMF (Xtext) » Matching braces in comments

Help

Home

Home » Modeling » TMF (Xtext) » Matching braces in comments

Show: Today's Messages :: Show Polls :: Message Navigator

Matching braces in comments [message #729613]

Mon, 26 September 2011 16:02

amey.par

Messages: 17
Registered: July 2011

Junior Member

I apologize for the ambiguous title, just couldn't think of a better one to describe my problem.

I'm developing support using Xtext for a language similar to C++/Java, and as such it supports declaring "structs", the grammar for which I've defined as

StructDeclaration:
'struct' '{'(declarations+=Declaration)* '}';

The language also supports comments structured as follows:

cpptext
{
    function Something()
    {
         if (true)
         {
         }
    }
}

Here, everything between the opening brace following "cpptext" up to the matching closing brace is to be ignored. However, this is complicated by the fact that there might be multiple nested opening and (matching) closing braces between the parent braces. So, a terminal rule like

CPP_TEXT: 'cpptext' -> '{' -> '}';

consumes everything from the first opening brace following "cpptext" till the first closing brace it encounters, and then throws parsing errors. I believe I need a nested rule like:

CPP_TEXT:
	('cpptext'|'structcpptext') -> CPP_BLOCK;

terminal CPP_BLOCK:
	'{' -> (CPP_BLOCK)* -> '}';

but the problem with this is that CPP_BLOCK then matches other (non-comment) braces in my model.

What should I do here?

Thanks,
Amey

[Updated on: Mon, 26 September 2011 16:24]

Report message to a moderator

Re: Matching braces in comments [message #729710 is a reply to message #729613]

Mon, 26 September 2011 19:53

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

That will be a bit tricky if the content of the "structured comment" can
be invalid code. Otherwise, just treat it as a statement.

If it can contain anything - e.g
cpptext {
I am a comment with " and ' and [ ... and lots of other funny stuff
}

Then you need to solve this with an external lexer that counts opening
and closing braces.

Regards
- henrik

On 9/26/11 6:02 PM, amey.par wrote:
> I apologize for the ambiguous title, just couldn't think of a better one
> to describe my problem.
>
> I'm developing support using Xtext for a language similar to C++/Java,
> and as such it supports declaring "structs", the grammar for which I've
> defined as
>
> StructDeclaration:
> 'struct' '{'(declarations+=Declaration)* '}';
>
> The language also supports comments with the following syntax:
>
> cpptext
> {
> function Something()
> {
> if (true)
> {
> }
> }
> }
>
> Here, everything between the opening brace following "cpptext" up to the
> matching closing brace is to be ignored. However, this is complicated by
> the fact that there might be multiple nested opening and (matching)
> closing braces between the parent braces. So, a terminal rule like
>
> CPP_TEXT: 'cpptext' -> '{' -> '}';
>
> consumes everything from the first opening brace following "cpptext"
> till the first closing brace it encounters, and then throws parsing
> errors. I believe I need a nested rule like:
>
> CPP_TEXT:
> ('cpptext'|'structcpptext') -> CPP_BLOCK;
>
> terminal CPP_BLOCK:
> '{' -> (CPP_BLOCK)* -> '}';
>
> but the problem with this is that CPP_BLOCK then matches other
> (non-comment) braces in my model.
>
> What should I do here?
>
> Thanks,
> Amey

Report message to a moderator

Re: Matching braces in comments [message #729753 is a reply to message #729710]

Mon, 26 September 2011 22:28

amey.par

Messages: 17
Registered: July 2011

Junior Member

Thanks for the reply. As per your advice above and also here, I'll try and implement an external lexer.
I was about to ask how I should override the generated mRULE_CPP_TEXT() method in InternalMyDslLexer.java (since it's declared as final, and I don't want to copy-paste generated code into an external lexer) but then I found this. I'm terribly underprepared to write my own lexer, so I guess I'll just have to remember to change the rule-number whenever I add a new rule to my grammar.

Thanks for your help, I'll post back here when I'm done Smile

Edit - Wow, looks like you've answered the same question patiently once every few months. I applaud your patience, and thanks for replying again Smile

[Updated on: Mon, 26 September 2011 22:36]

Report message to a moderator

Re: Matching braces in comments [message #729929 is a reply to message #729753]

Tue, 27 September 2011 09:56

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 9/27/11 12:28 AM, amey.par wrote:
> Thanks for the reply. As per your advice above and also
> http://www.eclipse.org/forums/index.php/mv/msg/231016/710334/#msg_710334, I'll
> try and implement an external lexer.
> I was about to ask how I should override the generated mRULE_CPP_TEXT()
> method in InternalMyDslLexer.java (since it's declared as final, and I
> don't want to copy-paste generated code into an external lexer) but then
> I found
If you try to override the generated lexer, you have to override methods
earlier in the sequence - once you get to mRULE.... calls it is too late.

> http://www.eclipse.org/forums/index.php/mv/msg/208085/666786/#msg_666786. I'm
> terribly underprepared to write my own lexer, so I guess I'll just have
> to remember to change the rule-number whenever I add any rule to my
> grammar.
>
afaik, you only have to make adjustments when you modify terminals and
keywords. Other changes to rules does not affect the lexer.

> Thanks for your help, I'll post back here when I'm done :)

- henrik

Report message to a moderator

Re: Matching braces in comments [message #730833 is a reply to message #729929]

Thu, 29 September 2011 10:17

Daniel Missing name

Messages: 101
Registered: July 2011

Senior Member

The easiest way would be to simply change the start and end braces of your cpp text like xtend does it with ''' ''' for RichStrings. You probably like one of those ideas:

terminal CPP_TEXT: '{{' -> '}}'; // double braces
terminal CPP_TEXT: '@{' -> '}'; // similar to the c# @"" strings
terminal CPP_TEXT: '${' -> '}'; // as in many template languages
terminal CPP_TEXT: '#{' -> '}'; // JSF style

As a developer I'd like the idea of a special brace more than writing a keyword like 'cpptext' every time.

Report message to a moderator

Re: Matching braces in comments [message #730896 is a reply to message #730833]

Thu, 29 September 2011 13:20

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

Well, that does not actually work as the terminator must be specific too
in his case to be able to capture nested '}'. And it gets more
complicated if it should be possible to nest one 'cpptext' inside another.

Agree that it is a lot simpler if there are distinct start and end
tokens and no nesting.

- henrik

On 9/29/11 12:17 PM, Daniel wrote:
> The easiest way would be to simply change the start and end braces of
> your cpp text like xtend does it with ''' ''' for RichStrings. You
> probably like one of those ideas:
>
> terminal CPP_TEXT: '{{' -> '}}'; // double braces
> terminal CPP_TEXT: '@{' -> '}'; // similar to the c# @"" strings
> terminal CPP_TEXT: '${' -> '}'; // as in many template languages
> terminal CPP_TEXT: '#{' -> '}'; // JSF style
>
> As a developer I'd like the idea of a special brace more than writing a
> keyword like 'cpptext' every time.

Report message to a moderator

Re: Matching braces in comments [message #747904 is a reply to message #730833]

Mon, 24 October 2011 10:56

amey.par

Messages: 17
Registered: July 2011

Junior Member

Daniel wrote on Thu, 29 September 2011 06:17

The easiest way would be to simply change the start and end braces of your cpp text like xtend does it with ''' ''' for RichStrings.

True, but sadly I'm not developing my own language, just making an Eclipse plugin for an existing language: http://udn.epicgames.com/Three/UnrealScriptReference.html

Anyhow, I've succeeded in extend the lexer just to match my comments, and true to my word, here I am posting how I did it in case someone else trying to solve a similar issue stumbles across this thread.

In my runtime module:

	public Class<? extends org.eclipse.xtext.parser.antlr.Lexer> bindLexer() {
		return com.wirywolf.parser.lexer.UnrealscriptLexer.class;
	}
	
	public void configureRuntimeLexer(com.google.inject.Binder binder) {
        binder.bind(org.eclipse.xtext.parser.antlr.Lexer.class)
        .annotatedWith(com.google.inject.name.Names
        .named(org.eclipse.xtext.parser.antlr.LexerBindings.RUNTIME))
        .to(com.wirywolf.parser.lexer.UnrealscriptLexer.class);
	}

and the new class I added, UnrealscriptLexer.java:

public class UnrealscriptLexer extends InternalUnrealscriptLexer 
{
    // dfa3, dfa19 are auto-generated protected members in the lexer
    protected DFA my_dfa3, my_dfa19;
	
    public UnrealscriptLexer() 
    {
    	super();
    	my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
    public UnrealscriptLexer(CharStream input) 
    {
        super(input);
        my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
    public UnrealscriptLexer(CharStream input, RecognizerSharedState state)
    {
        super(input,state);
        my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
	public void mTokens() throws RecognitionException 
	{
		// The generated lexer has my custom comment rule mapped to 158
		if (my_dfa19.predict(input) == 158)
			Custom_mRULE_CPP_TEXT();
		else
			super.mTokens();
	}
        
	public final void Custom_mRULE_CPP_TEXT() throws RecognitionException 
	{
        try 
        {
            int _type = RULE_CPP_TEXT;
            int _channel = DEFAULT_TOKEN_CHANNEL;
            {
            int alt2=2;
            int LA2_0 = input.LA(1);

            if ( (LA2_0=='c') ) 
            {
                alt2=1;
            }
            else if ( (LA2_0=='s') ) 
            {
                alt2=2;
            }
            else 
            {
                NoViableAltException nvae =
                    new NoViableAltException("", 2, 0, input);

                throw nvae;
            }
            switch (alt2) 
            {
                case 1 :
                	match("cpptext");
                    break;
                case 2 :
	                match("structcpptext");
                    break;
            }

            loop3:
            do 
            {
                int alt3=2;
                alt3 = my_dfa3.predict(input);
                switch (alt3) 
                {
            	case 1 :
            	    // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:73: .
            	    {
            	    matchAny(); 

            	    }
            	    break;

            	default :
            	    break loop3;
                }
            } while (true);

            match('{'); 
            // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:81: ( options {greedy=false; } : . )*
            int open_braces = 0;
            loop4:
            do {
                int alt4=2;
                int LA4_0 = input.LA(1);

                if (LA4_0=='}')
                {
                	if (open_braces > 0)
                	{
                		open_braces -= 1;
                		alt4 = 1;
                	}
                	else
                	{
                		alt4 = 2;
                	}
                }
                else if (LA4_0 == '{')
                {
                	open_braces += 1;
                	alt4 = 1;
                }
                else if ( ((LA4_0>='\u0000' && LA4_0<='|')||(LA4_0>='~' && LA4_0<='\uFFFF')) ) {
                    alt4=1;
                }


                switch (alt4) {
            	case 1 :
            	    // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:109: .
            	    {
            	    matchAny(); 

            	    }
            	    break;

            	default :
            	    break loop4;
                }
            } while (true);

            match('}'); 

            }

            state.type = _type;
            state.channel = _channel;
        }
        finally {
        }
    }
}

Report message to a moderator

Previous Topic:	[very newbie] Obtain file name
Next Topic:	PROBLEM: JvmTypeReference to Enum within a class does NOT work

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Sep 26 05:10:54 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter