How to define a greedy version of Xtext until (->)? [message #1781919] |
Wed, 14 February 2018 15:57 |
Nicolas Rouquette Messages: 40 Registered: December 2014 |
Member |
|
|
Xtext's until operator is useful for consuming everything between two tokens, for example, Scala-like raw strings:
terminal RAW_STRING_VALUE returns RawStringDataType: '"""' -> '"""';
Unfortunately, this is a non-greedy lexer rule as seen in the generated
Antlr rule:
RULE_RAW_STRING_VALUE : '"""' ( options {greedy=false;} : . )*'"""';
And it's non-greedy as seen in the generated lexer java logic:
// $ANTLR start "RULE_RAW_STRING_VALUE"
public final void mRULE_RAW_STRING_VALUE() throws RecognitionException {
try {
int _type = RULE_RAW_STRING_VALUE;
int _channel = DEFAULT_TOKEN_CHANNEL;
// InternalOML.g:9081:23: ( '\"\"\"' ( options {greedy=false; } : . )* '\"\"\"' )
// InternalOML.g:9081:25: '\"\"\"' ( options {greedy=false; } : . )* '\"\"\"'
{
match("\"\"\"");
// InternalOML.g:9081:31: ( options {greedy=false; } : . )*
loop36:
do {
int alt36=2;
int LA36_0 = input.LA(1);
if ( (LA36_0=='\"') ) {
int LA36_1 = input.LA(2);
if ( (LA36_1=='\"') ) {
int LA36_3 = input.LA(3);
if ( (LA36_3=='\"') ) {
alt36=2;
}
else if ( ((LA36_3>='\u0000' && LA36_3<='!')||(LA36_3>='#' && LA36_3<='\uFFFF')) ) {
alt36=1;
}
}
else if ( ((LA36_1>='\u0000' && LA36_1<='!')||(LA36_1>='#' && LA36_1<='\uFFFF')) ) {
alt36=1;
}
}
else if ( ((LA36_0>='\u0000' && LA36_0<='!')||(LA36_0>='#' && LA36_0<='\uFFFF')) ) {
alt36=1;
}
switch (alt36) {
case 1 :
// InternalOML.g:9081:59: .
{
matchAny();
}
break;
default :
break loop36;
}
} while (true);
match("\"\"\"");
}
state.type = _type;
state.channel = _channel;
}
finally {
}
}
// $ANTLR end "RULE_RAW_STRING_VALUE"
Non-greedy is unfortunately not what's intuitively expected.
For example, the following ought to be legal raw strings:
"""1""""
"""2"""""
"""3""""""
"""4"""""""
These ought to in such a way that the contents of the RawString should be:
That's not what happens; in fact, the whole thing fails to lex properly.
I would like a greedy version of the until operator, perhaps something like this:
terminal RAW_STRING_VALUE returns RawStringDataType: '"""' ->* '"""';
The idea would be to generate a greedy version of the Antlr grammar rule
somehow such that the lexer java code produced would be functionally
equivalent to the following:
// $ANTLR start "RULE_RAW_STRING_VALUE"
public final void mRULE_RAW_STRING_VALUE() throws RecognitionException {
try {
int _type = RULE_RAW_STRING_VALUE;
int _channel = DEFAULT_TOKEN_CHANNEL;
// InternalOML.g:9081:23: ( '\"\"\"' ( options {greedy=true; } : . )* '\"\"\"' )
// InternalOML.g:9081:25: '\"\"\"' ( options {greedy=true; } : . )* '\"\"\"'
{
match("\"\"\"");
// InternalOML.g:9081:31: ( options {greedy=true; } : . )*
loop36:
do {
int alt36=2;
int LA36_0 = input.LA(1);
if ( (LA36_0=='\"') ) {
int LA36_1 = input.LA(2);
if ( (LA36_1=='\"') ) {
int LA36_3 = input.LA(3);
if ( (LA36_3=='\"') ) {
// greedy version of ->
int LA36_4 = input.LA(4);
if ( CharStream.EOF == LA36_4 || LA36_4 != '\"') {
alt36=2;
} else {
alt36=1;
}
}
else if ( ((LA36_3>='\u0000' && LA36_3<='!')||(LA36_3>='#' && LA36_3<='\uFFFF')) ) {
alt36=1;
}
}
else if ( ((LA36_1>='\u0000' && LA36_1<='!')||(LA36_1>='#' && LA36_1<='\uFFFF')) ) {
alt36=1;
}
}
else if ( ((LA36_0>='\u0000' && LA36_0<='!')||(LA36_0>='#' && LA36_0<='\uFFFF')) ) {
alt36=1;
}
switch (alt36) {
case 1 :
// InternalOML.g:9081:59: .
{
matchAny();
}
break;
default :
break loop36;
}
} while (true);
match("\"\"\"");
}
state.type = _type;
state.channel = _channel;
}
finally {
}
}
// $ANTLR end "RULE_RAW_STRING_VALUE"
This problem isn't unique to my language.
In fact, it affects Xtend as shown below:
// Xtend code fails to parse!
class raw {
static val foo1 = '''1''''
static val foo2 = '''2'''''
static val foo3 = '''3''''''
static val foo4 = '''4'''''''
def static void main(String[] args) {
println(foo1)
println(foo2)
println(foo3)
println(foo4)
}
}
Scala as a similar syntax for raw strings; however, unlike Xtend's non-greedy raw strings, Scala's raw strings are greedy; e.g.:
object raw {
val foo1 = """1""""
val foo2 = """2"""""
val foo3 = """3""""""
val foo4 = """4"""""""
def main(args: Array[String]): Unit = {
System.out.println(foo1)
System.out.println(foo2)
System.out.println(foo3)
System.out.println(foo4)
}
}
when run, this produces:
Unless I missed something, the current Xtext 2.12 or 2.13 language doesn't provide a way to define a greedy until lexer rule as shown above. Am I correct about this?
If so, is it reasonable to ask for a new feature to support greedy until in xtext?
- Nicolas.
|
|
|
|
Re: How to define a greedy version of Xtext until (->)? [message #1781926 is a reply to message #1781921] |
Wed, 14 February 2018 16:32 |
|
or you try to tame the lexer generation
Workflow {
component = XtextGenerator {
configuration = CustomGeneratorModule {
project = StandardProjectConfig {
......
language = StandardLanguage {
name = "org.xtext.example.mydsl2.MyDsl"
fileExtensions = "mydsl2"
serializer = {
generateStub = false
}
validator = {
// composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
}
parserGenerator = {
combinedGrammar = false
}.....
import org.eclipse.xtext.xtext.generator.DefaultGeneratorModule;
import org.eclipse.xtext.xtext.generator.parser.antlr.AntlrGrammarGenerator;
public class CustomGeneratorModule extends DefaultGeneratorModule {
public Class<? extends AntlrGrammarGenerator> bindAntlrGrammarGenerator() {
return AntlrGrammarGenerator2.class;
}
}
import org.eclipse.xtext.Grammar
import org.eclipse.xtext.xtext.generator.parser.antlr.AntlrGrammarGenerator
import org.eclipse.xtext.xtext.generator.parser.antlr.AntlrOptions
class AntlrGrammarGenerator2 extends AntlrGrammarGenerator {
override protected CharSequence compileLexer(Grammar it, AntlrOptions options) {
'''
// custom
«super.compileLexer(it, options)»
'''
}
}
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
|
|
|
|
Powered by
FUDForum. Page generated in 0.04196 seconds