Eclipse Community Forums: TMF (Xtext) » Custom Syntax Error Recovery

Help

Home

Home » Modeling » TMF (Xtext) » Custom Syntax Error Recovery(Custom Syntax Error Recovert)

Show: Today's Messages :: Show Polls :: Message Navigator

Custom Syntax Error Recovery [message #1062025]

Wed, 05 June 2013 13:15

David Pizarro De La Iglesia

Messages: 5
Registered: June 2013

Junior Member

I am trying to create an xtext grammar that has beter parsing error recovery (for sensible syntax highlighting).

What I would like to do would be something like this.
(sorry can't add an external link yet)

www.antlr.org/wiki/display/ANTLR3/Custom+Syntax+Error+Recovery

i.e. error recovery within the looping of the following xtext grammar:

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    element+=ConfigurtaionElement? (EOL element+=ConfigurtaionElement?)* 
;

so it becomes something like:

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    element+=ConfigurtaionElement? syncParser (EOL element+=ConfigurtaionElement? syncParser)* 
;

is there a pre-existing mechanism to achieve this?
or will I have to try to enhance xtext to deal with a new keword, mapping it only generate a fragment of antlr grammar?

Thanks in advance.

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062118 is a reply to message #1062025]

Thu, 06 June 2013 06:06

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

your rule looks very strange to me. Have you tried

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    (element+=ConfigurtaionElement (EOL element+=ConfigurtaionElement)*)?
;

which would be the standard pattern.

Alex

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062146 is a reply to message #1062118]

Thu, 06 June 2013 08:49

David Pizarro De La Iglesia

Messages: 5
Registered: June 2013

Junior Member

Thanks for the reply, unfortunately the pre-existing language I am trying to write an eclipse editor/syntax-highlighting for, is poorly designed.

A configuration element must exist on a line by itself
but not each line has to have a configuration element.

Unfortunately the grammar that you sugested does not meet this property, and also still does not solve the problem I am having with the parser error recovery.

The issue with the error recovery is that when I recieve a token that is not a valid value; not any of the following:

the starting token of a Configuration element
end of line token (EOL)
a space token (SYM_SPACE)
a tab token (SYM_TAB)

the error recovery breaks out of the configuration rule. i.e. it does not swallow the invalid tokens (indicating an error), and then continue to process and parse more configuration elements.

an illustration of the scenario that I am trying to get working.
encountering the invalid token should not exit the configuration rule (antlr throws an exception on the invalid token), but should flag the issue and continue processing all the future config elements.

ValidCfgElement
ValidCfgElement

ValidCfgElement
InvalidToken
ValidCfgElementNotParsed

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062158 is a reply to message #1062146]

Thu, 06 June 2013 10:04

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

how about relaxing the definition of config element and then do validation on the elements. This allows giving user friendly error messages (and quickfixes). You need not (and often enough cannot) put every language feature into the grammar. This is what scoping, validation, etc are for. Depending on the complexity of the language that might even be the case for line breaks between elements.

Nobody forces you to encode that at the grammar level.

Alex

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062193 is a reply to message #1062158]

Thu, 06 June 2013 13:25

David Pizarro De La Iglesia

Messages: 5
Registered: June 2013

Junior Member

Hi, thanks for your sugestions.

The current error I get is "missing EOF at 'Symbol of the invalid token'"

The configuration element is one of several items.

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    element+=ConfigurtaionElement? (EOL element+=ConfigurtaionElement?)* 
;

ConfigurtaionElement:
    Comment|ConfigItem|TCL
;

This means that its allowed starting symbols are one the following

Bang ("!")
Letter (('a'..'z')|('A'..'Z'))
special charater combination ("%{")

without explicitly writing a rule to parse the invalid tokens, I don't know how to relax the grammar.

The problem is that the standard error recovery mechanism breaks out of the loop in the rule.

For the following examples, a configuration element can't start with a number

example 1


!comment @ top level
block {
    !comment nested
    4   }
!comment @ top level
block {
    !comment nested
    4}
! comment @ top level
block {
    !comment nested
    4
    !This comment is outside the block
}
! no longer parsing

example 2


!comment @ top level
4
! no longer parsing

This is the current version of my grammar (work still in progress, so I still need to tidy and correct stuff, but it does work)

Note! the import is missing the http, as I still can't post external links.

grammar uk.me.pizarro.editor.cfg

import "www.eclipse.org/emf/2002/Ecore" as ecore

generate cfg "www.pizarro.me.uk/editor/cfg"

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    element+=ConfigurtaionElement? (EOL element+=ConfigurtaionElement?)* 
;

ConfigurtaionElement:
    Comment|ConfigItem|TCL
;

Comment hidden(): 
    SYM_BANG text=RAW_COMMENT 
;

RAW_COMMENT hidden():
    =>( ESCAPED_EOL  |
        LETTERS      |
        DIGITS       |
        ESCAPED_CHAR |
        SYM_TAB      |
        SYM_SPACE    |
        SYM_BANG     |
        SYM_DQUOTE   |
        SYM_NUM      |
        SYM_DOLLAR   |
        SYM_PCT      |
        SYM_AMP      |
        SYM_SQUOTE   |
        SYM_OP       |
        SYM_CP       |
        SYM_AST      |
        SYM_PLUS     |
        SYM_COMMA    |
        SYM_MINUS    |
        SYM_DOT      |
        SYM_SLASH    |
        SYM_COLON    |
        SYM_SCOLON   |
        SYM_LT       |
        SYM_EQ       |
        SYM_GT       |
        SYM_QMARK    |
        SYM_AT       |
        SYM_OBRKT    |
        SYM_BSLASH   |
        SYM_CBRKT    |
        SYM_CARET    |
        SYM_USCORE   |
        SYM_ACCENT   |
        SYM_OBRACE   |
        SYM_VBAR     |
        SYM_CBRACE   |
        SYM_TILDE
    )*
;

ConfigItem:
    name=ID EOL* value=ConfigValue
;

ConfigValue:
    (Data|DataList)|TCL|Block
;

Block hidden(SYM_SPACE,SYM_TAB):
    (syntax+=SYM_OBRACE data=BlockInternal syntax+=SYM_CBRACE)
;

BlockInternal hidden(SYM_SPACE,SYM_TAB,EOL):
    Configuration
;

Data hidden():
    (data=RawData)| (syntax+=SYM_DQUOTE data=RawString syntax+=SYM_DQUOTE)
;

// Do lists have to be comma separated?
DataList hidden(SYM_SPACE,SYM_TAB): {DataList}
    syntax+=SYM_OP EOL* ( data+=Data EOL* (syntax+=SYM_COMMA EOL* data+=Data EOL*)* )? syntax+=SYM_CP
;

ID  hidden():
    (LETTERS|SYM_USCORE)=>(LETTERS|SYM_USCORE|DIGITS)*
;

// Bug Warning! This is probably missing allowed symbols
RawData hidden():
    =>(LETTERS|DIGITS|SYM_USCORE)+
;

RawString hidden():
    =>( EOL          |
        ESCAPED_EOL  |
        LETTERS      |
        DIGITS       |
        ESCAPED_CHAR |
        SYM_TAB      |
        SYM_SPACE    |
        SYM_BANG     |
//      SYM_DQUOTE   |
        SYM_NUM      |
        SYM_DOLLAR   |
        SYM_PCT      |
        SYM_AMP      |
        SYM_SQUOTE   |
        SYM_OP       |
        SYM_CP       |
        SYM_AST      |
        SYM_PLUS     |
        SYM_COMMA    |
        SYM_MINUS    |
        SYM_DOT      |
        SYM_SLASH    |
        SYM_COLON    |
        SYM_SCOLON   |
        SYM_LT       |
        SYM_EQ       |
        SYM_GT       |
        SYM_QMARK    |
        SYM_AT       |
        SYM_OBRKT    |
        SYM_BSLASH   |
        SYM_CBRKT    |
        SYM_CARET    |
        SYM_USCORE   |
        SYM_ACCENT   |
        SYM_OBRACE   |
        SYM_VBAR     |
        SYM_CBRACE   |
        SYM_TILDE
    )*
;

TCL hidden(): {TCL}
    '%{' line+=RAW_TCL (ESCAPED_EOL line+=RAW_TCL)* '}%'
;


RAW_TCL hidden():
    =>( LETTERS      |
        DIGITS       |
        ESCAPED_CHAR |
        SYM_TAB      |
        SYM_SPACE    |
        SYM_BANG     |
        SYM_DQUOTE   |
        SYM_NUM      |
        SYM_DOLLAR   |
        SYM_PCT      |
        SYM_AMP      |
        SYM_SQUOTE   |
        SYM_OP       |
        SYM_CP       |
        SYM_AST      |
        SYM_PLUS     |
        SYM_COMMA    |
        SYM_MINUS    |
        SYM_DOT      |
        SYM_SLASH    |
        SYM_COLON    |
        SYM_SCOLON   |
        SYM_LT       |
        SYM_EQ       |
        SYM_GT       |
        SYM_QMARK    |
        SYM_AT       |
        SYM_OBRKT    |
        SYM_BSLASH   |
        SYM_CBRKT    |
        SYM_CARET    |
        SYM_USCORE   |
        SYM_ACCENT   |
//      SYM_OBRACE   |
        SYM_VBAR     |
//      SYM_CBRACE   |
        SYM_TILDE
    )*
;

terminal ESCAPED_EOL   : '\\'EOL;
terminal ESCAPED_CHAR  : '\\'(' '..'~');
terminal EOL           : (SYM_CR? '\n');

terminal SYM_CR        : '\r';
terminal SYM_TAB       : '\t';

terminal DIGITS        : ('0'..'9')+;
terminal LETTERS       : ('a'..'z'|'A'..'Z')+;


terminal SYM_SPACE     : ' ';
terminal SYM_BANG      : '!';
terminal SYM_DQUOTE    : '"';
terminal SYM_NUM       : '#';
terminal SYM_DOLLAR    : '$';
terminal SYM_PCT       : '%';
terminal SYM_AMP       : '&';
terminal SYM_SQUOTE    : '\'';
terminal SYM_OP        : '(';
terminal SYM_CP        : ')';
terminal SYM_AST       : '*';
terminal SYM_PLUS      : '+';
terminal SYM_COMMA     : ',';
terminal SYM_MINUS     : '-';
terminal SYM_DOT       : '.';
terminal SYM_SLASH     : '/';
terminal SYM_COLON     : ':';
terminal SYM_SCOLON    : ';';
terminal SYM_LT        : '<';
terminal SYM_EQ        : '=';
terminal SYM_GT        : '>';
terminal SYM_QMARK     : '?';
terminal SYM_AT        : '@';
terminal SYM_OBRKT     : '[';
terminal SYM_BSLASH    : '\\';
terminal SYM_CBRKT     : ']';
terminal SYM_CARET     : '^';
terminal SYM_USCORE    : '_';
terminal SYM_ACCENT    : '`';
terminal SYM_OBRACE    : '{';
terminal SYM_VBAR      : '|';
terminal SYM_CBRACE    : '}';
terminal SYM_TILDE     : '~';

terminal ANY_OTHER     : .;

[Updated on: Thu, 06 June 2013 13:26]

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062205 is a reply to message #1062193]

Thu, 06 June 2013 13:54

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

is the grammar complete in the sense that all language concepts are covered already (root is configuration with configuration elements which can be Comment or ConfigItem or TCL)? Or is this just a fraction of the language? At first glance, it looks way too complicated. Why all the terminal definitions? Why RawString, RawComment, RawTcl? Are you interested in the "internal semantic" of a string, comment, tcl?

Alex

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062227 is a reply to message #1062205]

Thu, 06 June 2013 15:19

David Pizarro De La Iglesia

Messages: 5
Registered: June 2013

Junior Member

I will be needing the internal semantics of various components.

e.g.
the TCL rule will be expanded to check the syntax (I will probably define this grammar in a seperate file and import it)
The comments contain meta-data that I will want to extract also.

But I thought I would get the parser error recovery sorted and nailed down properly before I finish expanding the grammar, functionality and then tidy it all.

if you prefer a sanitised version of the problem.
An ID must exist alone, one per line, but not every line has to have an ID.

grammar uk.me.pizarro.error.cfg

import "www.eclipse.org/emf/2002/Ecore" as ecore

generate cfg "www.pizarro.me.uk/error/cfg"

IDList hidden(WS): {IDList}
    element+=ID? (EOL element+=ID?)* 
;

ID  hidden():
    LETTERS=>(LETTERS|DIGITS)*
;

terminal EOL       : ('\r'? '\n');
terminal WS        : ('\t'|' ')+;

terminal DIGITS    : ('0'..'9')+;
terminal LETTERS   : ('a'..'z'|'A'..'Z')+;

terminal ANY_OTHER : .;

example scenario: missing EOF at '456'

ThisIDParses

ThisIDParses
456
ThisIDDoesNotParse

[Updated on: Thu, 06 June 2013 15:21]

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062309 is a reply to message #1062227]

Fri, 07 June 2013 07:05

Alexander Nittka

Messages: 1193
Registered: July 2009

Senior Member

Hi,

I still don't see the necessity of changing the default definitions for INT and ID, but this is another matter.

The cardinalities in the IDList definition are still problematic and I still think: allow numbers in the grammar and do a semantic validation.

Regarding your very first question. You can override the generated parser with your own parser implementation (which of course may extend it and just change a couple of methods).

Alex

Need training, onsite consulting or any other kind of help for Xtext?
Go visit http://xtext.itemis.com or send a mail to xtext@itemis.de

Report message to a moderator

Re: Custom Syntax Error Recovery [message #1062482 is a reply to message #1062309]

Sat, 08 June 2013 01:08

David Pizarro De La Iglesia

Messages: 5
Registered: June 2013

Junior Member

Thanks for the help and sugestions. Althouth it's not exactly what I wanted (i.e. I wanted to tweek the generation of the InternalCfg.g file), I have managed to get what I wanted working.

Please note that this code does smell... Embarrassed

i.e. its not the correct or nice way of achieveing the result, but it does work! Very Happy

I defined and added a SyncParser rule to my grammar.

Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
    element+=ConfigurtaionElement? errRec+=SyncParser (EOL element+=ConfigurtaionElement? errRec+=SyncParser)* 
;

// A match all rule, used as a dummy placement
SyncParser: {SyncParser} ;

And no I didn't miss type the SyncParser rule.

This generated the following fragment of code within the InternalCfg.g file.
I really was looking for a mechanism to define my own antlr grammar, but this will do untill I can figure it out properly.

// Entry rule entryRuleSyncParser
entryRuleSyncParser returns [EObject current=null] 
	:
	{ newCompositeNode(grammarAccess.getSyncParserRule()); }
	 iv_ruleSyncParser=ruleSyncParser 
	 { $current=$iv_ruleSyncParser.current; } 
	 EOF 
;

// Rule SyncParser
ruleSyncParser returns [EObject current=null] 
    @init { enterRule(); 
    }
    @after { leaveRule(); }:
(
    {
        $current = forceCreateModelElement(
            grammarAccess.getSyncParserAccess().getSyncParserAction(),
            $current);
    }
)
;

Now, unfortunately either of the two functions I would naturally have wanted to override in the InternalCfgParser.java file were both final methods. Crying or Very Sad

public final EObject entryRuleSyncParser() throws RecognitionException {
    // generated code
}

public final EObject ruleSyncParser() throws RecognitionException {
    EObject current = null;

    enterRule(); 
            
    // more generated code
}

But luckily for me the enterRule method is not. This meant I could fo this. Shocked

package uk.me.pizarro.parser;

import java.util.LinkedList;

import org.antlr.runtime.BitSet;
import org.antlr.runtime.NoViableAltException;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.RecognizerSharedState;
import org.antlr.runtime.Token;
import org.antlr.runtime.TokenStream;

import uk.me.pizarro.parser.antlr.internal.InternalCfgParser;
import uk.me.pizarro.services.CfgGrammarAccess;

public class MyInternalCfgParser extends InternalCfgParser {

    public MyInternalCfgParser(final TokenStream input) {
        super(input);
    }

    public MyInternalCfgParser(final TokenStream input, final RecognizerSharedState state) {
        super(input, state);
    }

    public MyInternalCfgParser(final TokenStream input,
            final CfgGrammarAccess grammarAccess) {
        super(input, grammarAccess);
    }

    @Override
    public void enterRule() {
        // Perform my custom error parsing
        if ("ruleSyncParser".equals(new Throwable().getStackTrace()[1]
                .getMethodName())) {
            midRuleErrorRecovery();
        }
    }

    protected void midRuleErrorRecovery() {
        final LinkedList<RecognitionException> errors = new LinkedList<RecognitionException>();
        
        final BitSet followSet = computeErrorRecoverySet().or(state.following[state._fsp]);

        final int mark = input.mark();

        while (!followSet.member(input.LA(1))) {
           if (input.LA(1) == Token.EOF) {
                input.rewind();
                return;
            }
            // are the first three arguments of the NoViableAltException correct?
            // they appear to work.
            errors.add(new NoViableAltException("", 0, 0, input));
            input.consume();
        }
        for (final RecognitionException e : errors) {
            reportError(e);
        }
        input.release(mark);
    }
}

Not nice as I had to use the stack trace to figure out who called the enterRule method, so I only applied the in-rule error recovery where the SyncParser rule was explicitly defined.

All that was left was to get the rest of the code to use my new parser.

Add the following to the CfgRuntimeModule.java

    @Override
    public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
        return uk.me.pizarro.parser.MyCfgParser.class;
    }

and also define the following class

public class MyCfgParser extends CfgParser {

    @Override
    protected InternalCfgParser createParser(XtextTokenStream stream) {
        return new MyInternalCfgParser(stream, getGrammarAccess());
    }
}

And thus we have parser error recovery without having to break out of the parsing loop (created for the * or + cardinality)

All tips are welcome, so if you have an alternative way of achieving the same result, I would love to know.

Thanks again...

[Updated on: Sat, 08 June 2013 01:31]

Report message to a moderator

Previous Topic:	Multiple ANTLR versions in set of projects
Next Topic:	problem with using enums as optional elements

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Sep 26 01:02:48 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter