Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Matching braces in comments
Matching braces in comments [message #729613] Mon, 26 September 2011 16:02 Go to next message
amey.par is currently offline amey.parFriend
Messages: 17
Registered: July 2011
Junior Member
I apologize for the ambiguous title, just couldn't think of a better one to describe my problem.

I'm developing support using Xtext for a language similar to C++/Java, and as such it supports declaring "structs", the grammar for which I've defined as
StructDeclaration:
'struct' '{'(declarations+=Declaration)* '}';

The language also supports comments structured as follows:
cpptext
{
    function Something()
    {
         if (true)
         {
         }
    }
}

Here, everything between the opening brace following "cpptext" up to the matching closing brace is to be ignored. However, this is complicated by the fact that there might be multiple nested opening and (matching) closing braces between the parent braces. So, a terminal rule like
CPP_TEXT: 'cpptext' -> '{' -> '}';

consumes everything from the first opening brace following "cpptext" till the first closing brace it encounters, and then throws parsing errors. I believe I need a nested rule like:
CPP_TEXT:
	('cpptext'|'structcpptext') -> CPP_BLOCK;

terminal CPP_BLOCK:
	'{' -> (CPP_BLOCK)* -> '}';

but the problem with this is that CPP_BLOCK then matches other (non-comment) braces in my model.

What should I do here?

Thanks,
Amey

[Updated on: Mon, 26 September 2011 16:24]

Report message to a moderator

Re: Matching braces in comments [message #729710 is a reply to message #729613] Mon, 26 September 2011 19:53 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2500
Registered: July 2009
Senior Member
That will be a bit tricky if the content of the "structured comment" can
be invalid code. Otherwise, just treat it as a statement.

If it can contain anything - e.g
cpptext {
I am a comment with " and ' and [ ... and lots of other funny stuff
}

Then you need to solve this with an external lexer that counts opening
and closing braces.

Regards
- henrik


On 9/26/11 6:02 PM, amey.par wrote:
> I apologize for the ambiguous title, just couldn't think of a better one
> to describe my problem.
>
> I'm developing support using Xtext for a language similar to C++/Java,
> and as such it supports declaring "structs", the grammar for which I've
> defined as
>
> StructDeclaration:
> 'struct' '{'(declarations+=Declaration)* '}';
>
> The language also supports comments with the following syntax:
>
> cpptext
> {
> function Something()
> {
> if (true)
> {
> }
> }
> }
>
> Here, everything between the opening brace following "cpptext" up to the
> matching closing brace is to be ignored. However, this is complicated by
> the fact that there might be multiple nested opening and (matching)
> closing braces between the parent braces. So, a terminal rule like
>
> CPP_TEXT: 'cpptext' -> '{' -> '}';
>
> consumes everything from the first opening brace following "cpptext"
> till the first closing brace it encounters, and then throws parsing
> errors. I believe I need a nested rule like:
>
> CPP_TEXT:
> ('cpptext'|'structcpptext') -> CPP_BLOCK;
>
> terminal CPP_BLOCK:
> '{' -> (CPP_BLOCK)* -> '}';
>
> but the problem with this is that CPP_BLOCK then matches other
> (non-comment) braces in my model.
>
> What should I do here?
>
> Thanks,
> Amey
Re: Matching braces in comments [message #729753 is a reply to message #729710] Mon, 26 September 2011 22:28 Go to previous messageGo to next message
amey.par is currently offline amey.parFriend
Messages: 17
Registered: July 2011
Junior Member
Thanks for the reply. As per your advice above and also here, I'll try and implement an external lexer.
I was about to ask how I should override the generated mRULE_CPP_TEXT() method in InternalMyDslLexer.java (since it's declared as final, and I don't want to copy-paste generated code into an external lexer) but then I found this. I'm terribly underprepared to write my own lexer, so I guess I'll just have to remember to change the rule-number whenever I add a new rule to my grammar.

Thanks for your help, I'll post back here when I'm done Smile

Edit - Wow, looks like you've answered the same question patiently once every few months. I applaud your patience, and thanks for replying again Smile

[Updated on: Mon, 26 September 2011 22:36]

Report message to a moderator

Re: Matching braces in comments [message #729929 is a reply to message #729753] Tue, 27 September 2011 09:56 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2500
Registered: July 2009
Senior Member
On 9/27/11 12:28 AM, amey.par wrote:
> Thanks for the reply. As per your advice above and also
> http://www.eclipse.org/forums/index.php/mv/msg/231016/710334/#msg_710334, I'll
> try and implement an external lexer.
> I was about to ask how I should override the generated mRULE_CPP_TEXT()
> method in InternalMyDslLexer.java (since it's declared as final, and I
> don't want to copy-paste generated code into an external lexer) but then
> I found
If you try to override the generated lexer, you have to override methods
earlier in the sequence - once you get to mRULE.... calls it is too late.

> http://www.eclipse.org/forums/index.php/mv/msg/208085/666786/#msg_666786. I'm
> terribly underprepared to write my own lexer, so I guess I'll just have
> to remember to change the rule-number whenever I add any rule to my
> grammar.
>
afaik, you only have to make adjustments when you modify terminals and
keywords. Other changes to rules does not affect the lexer.

> Thanks for your help, I'll post back here when I'm done :)

- henrik
Re: Matching braces in comments [message #730833 is a reply to message #729929] Thu, 29 September 2011 10:17 Go to previous messageGo to next message
Daniel Missing name is currently offline Daniel Missing nameFriend
Messages: 101
Registered: July 2011
Senior Member
The easiest way would be to simply change the start and end braces of your cpp text like xtend does it with ''' ''' for RichStrings. You probably like one of those ideas:

terminal CPP_TEXT: '{{' -> '}}'; // double braces
terminal CPP_TEXT: '@{' -> '}'; // similar to the c# @"" strings
terminal CPP_TEXT: '${' -> '}'; // as in many template languages
terminal CPP_TEXT: '#{' -> '}'; // JSF style

As a developer I'd like the idea of a special brace more than writing a keyword like 'cpptext' every time.
Re: Matching braces in comments [message #730896 is a reply to message #730833] Thu, 29 September 2011 13:20 Go to previous messageGo to next message
Henrik Lindberg is currently offline Henrik LindbergFriend
Messages: 2500
Registered: July 2009
Senior Member
Well, that does not actually work as the terminator must be specific too
in his case to be able to capture nested '}'. And it gets more
complicated if it should be possible to nest one 'cpptext' inside another.

Agree that it is a lot simpler if there are distinct start and end
tokens and no nesting.

- henrik


On 9/29/11 12:17 PM, Daniel wrote:
> The easiest way would be to simply change the start and end braces of
> your cpp text like xtend does it with ''' ''' for RichStrings. You
> probably like one of those ideas:
>
> terminal CPP_TEXT: '{{' -> '}}'; // double braces
> terminal CPP_TEXT: '@{' -> '}'; // similar to the c# @"" strings
> terminal CPP_TEXT: '${' -> '}'; // as in many template languages
> terminal CPP_TEXT: '#{' -> '}'; // JSF style
>
> As a developer I'd like the idea of a special brace more than writing a
> keyword like 'cpptext' every time.
Re: Matching braces in comments [message #747904 is a reply to message #730833] Mon, 24 October 2011 10:56 Go to previous message
amey.par is currently offline amey.parFriend
Messages: 17
Registered: July 2011
Junior Member
Daniel wrote on Thu, 29 September 2011 06:17
The easiest way would be to simply change the start and end braces of your cpp text like xtend does it with ''' ''' for RichStrings.


True, but sadly I'm not developing my own language, just making an Eclipse plugin for an existing language: http://udn.epicgames.com/Three/UnrealScriptReference.html

Anyhow, I've succeeded in extend the lexer just to match my comments, and true to my word, here I am posting how I did it in case someone else trying to solve a similar issue stumbles across this thread.

In my runtime module:

	public Class<? extends org.eclipse.xtext.parser.antlr.Lexer> bindLexer() {
		return com.wirywolf.parser.lexer.UnrealscriptLexer.class;
	}
	
	public void configureRuntimeLexer(com.google.inject.Binder binder) {
        binder.bind(org.eclipse.xtext.parser.antlr.Lexer.class)
        .annotatedWith(com.google.inject.name.Names
        .named(org.eclipse.xtext.parser.antlr.LexerBindings.RUNTIME))
        .to(com.wirywolf.parser.lexer.UnrealscriptLexer.class);
	}


and the new class I added, UnrealscriptLexer.java:

public class UnrealscriptLexer extends InternalUnrealscriptLexer 
{
    // dfa3, dfa19 are auto-generated protected members in the lexer
    protected DFA my_dfa3, my_dfa19;
	
    public UnrealscriptLexer() 
    {
    	super();
    	my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
    public UnrealscriptLexer(CharStream input) 
    {
        super(input);
        my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
    public UnrealscriptLexer(CharStream input, RecognizerSharedState state)
    {
        super(input,state);
        my_dfa3 = this.dfa3;
    	my_dfa19 = this.dfa19;
    }
    
	public void mTokens() throws RecognitionException 
	{
		// The generated lexer has my custom comment rule mapped to 158
		if (my_dfa19.predict(input) == 158)
			Custom_mRULE_CPP_TEXT();
		else
			super.mTokens();
	}
        
	public final void Custom_mRULE_CPP_TEXT() throws RecognitionException 
	{
        try 
        {
            int _type = RULE_CPP_TEXT;
            int _channel = DEFAULT_TOKEN_CHANNEL;
            {
            int alt2=2;
            int LA2_0 = input.LA(1);

            if ( (LA2_0=='c') ) 
            {
                alt2=1;
            }
            else if ( (LA2_0=='s') ) 
            {
                alt2=2;
            }
            else 
            {
                NoViableAltException nvae =
                    new NoViableAltException("", 2, 0, input);

                throw nvae;
            }
            switch (alt2) 
            {
                case 1 :
                	match("cpptext");
                    break;
                case 2 :
	                match("structcpptext");
                    break;
            }

            loop3:
            do 
            {
                int alt3=2;
                alt3 = my_dfa3.predict(input);
                switch (alt3) 
                {
            	case 1 :
            	    // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:73: .
            	    {
            	    matchAny(); 

            	    }
            	    break;

            	default :
            	    break loop3;
                }
            } while (true);

            match('{'); 
            // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:81: ( options {greedy=false; } : . )*
            int open_braces = 0;
            loop4:
            do {
                int alt4=2;
                int LA4_0 = input.LA(1);

                if (LA4_0=='}')
                {
                	if (open_braces > 0)
                	{
                		open_braces -= 1;
                		alt4 = 1;
                	}
                	else
                	{
                		alt4 = 2;
                	}
                }
                else if (LA4_0 == '{')
                {
                	open_braces += 1;
                	alt4 = 1;
                }
                else if ( ((LA4_0>='\u0000' && LA4_0<='|')||(LA4_0>='~' && LA4_0<='\uFFFF')) ) {
                    alt4=1;
                }


                switch (alt4) {
            	case 1 :
            	    // ../com.wirywolf.unrealstudio/src-gen/com/wirywolf/parser/antlr/lexer/InternalUnrealscriptLexer.g:335:109: .
            	    {
            	    matchAny(); 

            	    }
            	    break;

            	default :
            	    break loop4;
                }
            } while (true);

            match('}'); 

            }

            state.type = _type;
            state.channel = _channel;
        }
        finally {
        }
    }
}

Previous Topic:[very newbie] Obtain file name
Next Topic:PROBLEM: JvmTypeReference to Enum within a class does NOT work
Goto Forum:
  


Current Time: Fri Nov 28 15:07:17 GMT 2014

Powered by FUDForum. Page generated in 0.03943 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software