|
|
|
|
Re: Custom Syntax Error Recovery [message #1062193 is a reply to message #1062158] |
Thu, 06 June 2013 13:25 |
David Pizarro De La Iglesia Messages: 5 Registered: June 2013 |
Junior Member |
|
|
Hi, thanks for your sugestions.
The current error I get is "missing EOF at 'Symbol of the invalid token'"
The configuration element is one of several items.
Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
element+=ConfigurtaionElement? (EOL element+=ConfigurtaionElement?)*
;
ConfigurtaionElement:
Comment|ConfigItem|TCL
;
This means that its allowed starting symbols are one the following
- Bang ("!")
- Letter (('a'..'z')|('A'..'Z'))
- special charater combination ("%{")
without explicitly writing a rule to parse the invalid tokens, I don't know how to relax the grammar.
The problem is that the standard error recovery mechanism breaks out of the loop in the rule.
For the following examples, a configuration element can't start with a number
example 1
!comment @ top level
block {
!comment nested
4 }
!comment @ top level
block {
!comment nested
4}
! comment @ top level
block {
!comment nested
4
!This comment is outside the block
}
! no longer parsing
example 2
!comment @ top level
4
! no longer parsing
This is the current version of my grammar (work still in progress, so I still need to tidy and correct stuff, but it does work)
Note! the import is missing the http, as I still can't post external links.
grammar uk.me.pizarro.editor.cfg
import "www.eclipse.org/emf/2002/Ecore" as ecore
generate cfg "www.pizarro.me.uk/editor/cfg"
Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
element+=ConfigurtaionElement? (EOL element+=ConfigurtaionElement?)*
;
ConfigurtaionElement:
Comment|ConfigItem|TCL
;
Comment hidden():
SYM_BANG text=RAW_COMMENT
;
RAW_COMMENT hidden():
=>( ESCAPED_EOL |
LETTERS |
DIGITS |
ESCAPED_CHAR |
SYM_TAB |
SYM_SPACE |
SYM_BANG |
SYM_DQUOTE |
SYM_NUM |
SYM_DOLLAR |
SYM_PCT |
SYM_AMP |
SYM_SQUOTE |
SYM_OP |
SYM_CP |
SYM_AST |
SYM_PLUS |
SYM_COMMA |
SYM_MINUS |
SYM_DOT |
SYM_SLASH |
SYM_COLON |
SYM_SCOLON |
SYM_LT |
SYM_EQ |
SYM_GT |
SYM_QMARK |
SYM_AT |
SYM_OBRKT |
SYM_BSLASH |
SYM_CBRKT |
SYM_CARET |
SYM_USCORE |
SYM_ACCENT |
SYM_OBRACE |
SYM_VBAR |
SYM_CBRACE |
SYM_TILDE
)*
;
ConfigItem:
name=ID EOL* value=ConfigValue
;
ConfigValue:
(Data|DataList)|TCL|Block
;
Block hidden(SYM_SPACE,SYM_TAB):
(syntax+=SYM_OBRACE data=BlockInternal syntax+=SYM_CBRACE)
;
BlockInternal hidden(SYM_SPACE,SYM_TAB,EOL):
Configuration
;
Data hidden():
(data=RawData)| (syntax+=SYM_DQUOTE data=RawString syntax+=SYM_DQUOTE)
;
// Do lists have to be comma separated?
DataList hidden(SYM_SPACE,SYM_TAB): {DataList}
syntax+=SYM_OP EOL* ( data+=Data EOL* (syntax+=SYM_COMMA EOL* data+=Data EOL*)* )? syntax+=SYM_CP
;
ID hidden():
(LETTERS|SYM_USCORE)=>(LETTERS|SYM_USCORE|DIGITS)*
;
// Bug Warning! This is probably missing allowed symbols
RawData hidden():
=>(LETTERS|DIGITS|SYM_USCORE)+
;
RawString hidden():
=>( EOL |
ESCAPED_EOL |
LETTERS |
DIGITS |
ESCAPED_CHAR |
SYM_TAB |
SYM_SPACE |
SYM_BANG |
// SYM_DQUOTE |
SYM_NUM |
SYM_DOLLAR |
SYM_PCT |
SYM_AMP |
SYM_SQUOTE |
SYM_OP |
SYM_CP |
SYM_AST |
SYM_PLUS |
SYM_COMMA |
SYM_MINUS |
SYM_DOT |
SYM_SLASH |
SYM_COLON |
SYM_SCOLON |
SYM_LT |
SYM_EQ |
SYM_GT |
SYM_QMARK |
SYM_AT |
SYM_OBRKT |
SYM_BSLASH |
SYM_CBRKT |
SYM_CARET |
SYM_USCORE |
SYM_ACCENT |
SYM_OBRACE |
SYM_VBAR |
SYM_CBRACE |
SYM_TILDE
)*
;
TCL hidden(): {TCL}
'%{' line+=RAW_TCL (ESCAPED_EOL line+=RAW_TCL)* '}%'
;
RAW_TCL hidden():
=>( LETTERS |
DIGITS |
ESCAPED_CHAR |
SYM_TAB |
SYM_SPACE |
SYM_BANG |
SYM_DQUOTE |
SYM_NUM |
SYM_DOLLAR |
SYM_PCT |
SYM_AMP |
SYM_SQUOTE |
SYM_OP |
SYM_CP |
SYM_AST |
SYM_PLUS |
SYM_COMMA |
SYM_MINUS |
SYM_DOT |
SYM_SLASH |
SYM_COLON |
SYM_SCOLON |
SYM_LT |
SYM_EQ |
SYM_GT |
SYM_QMARK |
SYM_AT |
SYM_OBRKT |
SYM_BSLASH |
SYM_CBRKT |
SYM_CARET |
SYM_USCORE |
SYM_ACCENT |
// SYM_OBRACE |
SYM_VBAR |
// SYM_CBRACE |
SYM_TILDE
)*
;
terminal ESCAPED_EOL : '\\'EOL;
terminal ESCAPED_CHAR : '\\'(' '..'~');
terminal EOL : (SYM_CR? '\n');
terminal SYM_CR : '\r';
terminal SYM_TAB : '\t';
terminal DIGITS : ('0'..'9')+;
terminal LETTERS : ('a'..'z'|'A'..'Z')+;
terminal SYM_SPACE : ' ';
terminal SYM_BANG : '!';
terminal SYM_DQUOTE : '"';
terminal SYM_NUM : '#';
terminal SYM_DOLLAR : '$';
terminal SYM_PCT : '%';
terminal SYM_AMP : '&';
terminal SYM_SQUOTE : '\'';
terminal SYM_OP : '(';
terminal SYM_CP : ')';
terminal SYM_AST : '*';
terminal SYM_PLUS : '+';
terminal SYM_COMMA : ',';
terminal SYM_MINUS : '-';
terminal SYM_DOT : '.';
terminal SYM_SLASH : '/';
terminal SYM_COLON : ':';
terminal SYM_SCOLON : ';';
terminal SYM_LT : '<';
terminal SYM_EQ : '=';
terminal SYM_GT : '>';
terminal SYM_QMARK : '?';
terminal SYM_AT : '@';
terminal SYM_OBRKT : '[';
terminal SYM_BSLASH : '\\';
terminal SYM_CBRKT : ']';
terminal SYM_CARET : '^';
terminal SYM_USCORE : '_';
terminal SYM_ACCENT : '`';
terminal SYM_OBRACE : '{';
terminal SYM_VBAR : '|';
terminal SYM_CBRACE : '}';
terminal SYM_TILDE : '~';
terminal ANY_OTHER : .;
[Updated on: Thu, 06 June 2013 13:26] Report message to a moderator
|
|
|
|
Re: Custom Syntax Error Recovery [message #1062227 is a reply to message #1062205] |
Thu, 06 June 2013 15:19 |
David Pizarro De La Iglesia Messages: 5 Registered: June 2013 |
Junior Member |
|
|
I will be needing the internal semantics of various components.
e.g.
the TCL rule will be expanded to check the syntax (I will probably define this grammar in a seperate file and import it)
The comments contain meta-data that I will want to extract also.
But I thought I would get the parser error recovery sorted and nailed down properly before I finish expanding the grammar, functionality and then tidy it all.
if you prefer a sanitised version of the problem.
An ID must exist alone, one per line, but not every line has to have an ID.
grammar uk.me.pizarro.error.cfg
import "www.eclipse.org/emf/2002/Ecore" as ecore
generate cfg "www.pizarro.me.uk/error/cfg"
IDList hidden(WS): {IDList}
element+=ID? (EOL element+=ID?)*
;
ID hidden():
LETTERS=>(LETTERS|DIGITS)*
;
terminal EOL : ('\r'? '\n');
terminal WS : ('\t'|' ')+;
terminal DIGITS : ('0'..'9')+;
terminal LETTERS : ('a'..'z'|'A'..'Z')+;
terminal ANY_OTHER : .;
example scenario: missing EOF at '456'
ThisIDParses
ThisIDParses
456
ThisIDDoesNotParse
[Updated on: Thu, 06 June 2013 15:21] Report message to a moderator
|
|
|
|
Re: Custom Syntax Error Recovery [message #1062482 is a reply to message #1062309] |
Sat, 08 June 2013 01:08 |
David Pizarro De La Iglesia Messages: 5 Registered: June 2013 |
Junior Member |
|
|
Thanks for the help and sugestions. Althouth it's not exactly what I wanted (i.e. I wanted to tweek the generation of the InternalCfg.g file), I have managed to get what I wanted working.
Please note that this code does smell...
i.e. its not the correct or nice way of achieveing the result, but it does work!
I defined and added a SyncParser rule to my grammar.
Configuration hidden(SYM_SPACE,SYM_TAB): {Configuration}
element+=ConfigurtaionElement? errRec+=SyncParser (EOL element+=ConfigurtaionElement? errRec+=SyncParser)*
;
// A match all rule, used as a dummy placement
SyncParser: {SyncParser} ;
And no I didn't miss type the SyncParser rule.
This generated the following fragment of code within the InternalCfg.g file.
I really was looking for a mechanism to define my own antlr grammar, but this will do untill I can figure it out properly.
// Entry rule entryRuleSyncParser
entryRuleSyncParser returns [EObject current=null]
:
{ newCompositeNode(grammarAccess.getSyncParserRule()); }
iv_ruleSyncParser=ruleSyncParser
{ $current=$iv_ruleSyncParser.current; }
EOF
;
// Rule SyncParser
ruleSyncParser returns [EObject current=null]
@init { enterRule();
}
@after { leaveRule(); }:
(
{
$current = forceCreateModelElement(
grammarAccess.getSyncParserAccess().getSyncParserAction(),
$current);
}
)
;
Now, unfortunately either of the two functions I would naturally have wanted to override in the InternalCfgParser.java file were both final methods.
public final EObject entryRuleSyncParser() throws RecognitionException {
// generated code
}
public final EObject ruleSyncParser() throws RecognitionException {
EObject current = null;
enterRule();
// more generated code
}
But luckily for me the enterRule method is not. This meant I could fo this.
package uk.me.pizarro.parser;
import java.util.LinkedList;
import org.antlr.runtime.BitSet;
import org.antlr.runtime.NoViableAltException;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.RecognizerSharedState;
import org.antlr.runtime.Token;
import org.antlr.runtime.TokenStream;
import uk.me.pizarro.parser.antlr.internal.InternalCfgParser;
import uk.me.pizarro.services.CfgGrammarAccess;
public class MyInternalCfgParser extends InternalCfgParser {
public MyInternalCfgParser(final TokenStream input) {
super(input);
}
public MyInternalCfgParser(final TokenStream input, final RecognizerSharedState state) {
super(input, state);
}
public MyInternalCfgParser(final TokenStream input,
final CfgGrammarAccess grammarAccess) {
super(input, grammarAccess);
}
@Override
public void enterRule() {
// Perform my custom error parsing
if ("ruleSyncParser".equals(new Throwable().getStackTrace()[1]
.getMethodName())) {
midRuleErrorRecovery();
}
}
protected void midRuleErrorRecovery() {
final LinkedList<RecognitionException> errors = new LinkedList<RecognitionException>();
final BitSet followSet = computeErrorRecoverySet().or(state.following[state._fsp]);
final int mark = input.mark();
while (!followSet.member(input.LA(1))) {
if (input.LA(1) == Token.EOF) {
input.rewind();
return;
}
// are the first three arguments of the NoViableAltException correct?
// they appear to work.
errors.add(new NoViableAltException("", 0, 0, input));
input.consume();
}
for (final RecognitionException e : errors) {
reportError(e);
}
input.release(mark);
}
}
Not nice as I had to use the stack trace to figure out who called the enterRule method, so I only applied the in-rule error recovery where the SyncParser rule was explicitly defined.
All that was left was to get the rest of the code to use my new parser.
Add the following to the CfgRuntimeModule.java
@Override
public Class<? extends org.eclipse.xtext.parser.IParser> bindIParser() {
return uk.me.pizarro.parser.MyCfgParser.class;
}
and also define the following class
public class MyCfgParser extends CfgParser {
@Override
protected InternalCfgParser createParser(XtextTokenStream stream) {
return new MyInternalCfgParser(stream, getGrammarAccess());
}
}
And thus we have parser error recovery without having to break out of the parsing loop (created for the * or + cardinality)
All tips are welcome, so if you have an alternative way of achieving the same result, I would love to know.
Thanks again...
[Updated on: Sat, 08 June 2013 01:31] Report message to a moderator
|
|
|
Powered by
FUDForum. Page generated in 0.03913 seconds