Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Unable to generate parser for a big grammar
Unable to generate parser for a big grammar [message #1782358] Thu, 22 February 2018 10:37 Go to next message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
Hi

I implement a parser for complicated enough langugage. Original EBNF takes about 6000 lines. For now, my Xtext grammar takes only 1500 lines. Parser generation takes about 1 minute.

When I add one more rule to my grammar, I can't generate parser for 10+ minutes. The rule is trivial. I guess the problem is that the grammar is too big.

A process of parser generator uses 30% of CPU and 500Mb of RAM. I run it with -Xmx1024m.

Could you suggest a solution?

Thanks!
Re: Unable to generate parser for a big grammar [message #1782359 is a reply to message #1782358] Thu, 22 February 2018 10:57 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
Did you set an antlr timeout ?
Did you profile the generation?
How do you set memory


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782362 is a reply to message #1782359] Thu, 22 February 2018 11:05 Go to previous messageGo to next message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
I'm not experienced in Java development. Could you suggest how to set an antlr timeout and which Eclipse tool to use for profiling?

I created a "Run Configuration" in Eclipse for my MWE2 file. Opened an "Arguments" tab and set "VM arguments" to -Xmx1024m
Re: Unable to generate parser for a big grammar [message #1782364 is a reply to message #1782362] Thu, 22 February 2018 11:35 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
for conversion timeout

parserGenerator = {
antlrParam = "-xconversiontimeout"
antlrParam = "10000" // play with that value
}

inside the language section of workflow


could you provide a reproducible example / artifical grammar .



Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782365 is a reply to message #1782364] Thu, 22 February 2018 11:37 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
p.s: it might be that your grammar contains a specific contruct that makes the generation take forever e.g.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=489523


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782370 is a reply to message #1782365] Thu, 22 February 2018 12:50 Go to previous messageGo to next message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
I've got the error:
error(10):  internal error: org.antlr.tool.Grammar.createLookaheadDFA(Grammar.java:1279): could not even do k=1 for decision 33; reason: timed out (>10000ms)


There is only one rule in the generated InternalSQLParser.g, which contains 33 code blocks of the following form:
			otherlv_32=RightParenthesis
			{
				newLeafNode(otherlv_32, grammarAccess.getSequenceFunctionAccess().getRightParenthesisKeyword_4_8());
			}


I'll try to simplify this rule. I'm not sure that exactly this rule breaks parser generation. At least I've got a tool to debug a problem. Need some time to investigate it in more details. Thanks!
Re: Unable to generate parser for a big grammar [message #1782372 is a reply to message #1782370] Thu, 22 February 2018 12:53 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
did you try to increase the timeout even more?

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782378 is a reply to message #1782372] Thu, 22 February 2018 13:13 Go to previous messageGo to next message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
Ops, It seems that decision number is unrelated to the number of rule branches...

Also it seems that, independently of timeout value, parser generation runs forever if I enable these extra rules.

I've send you my grammar by email.
Re: Unable to generate parser for a big grammar [message #1782407 is a reply to message #1782378] Thu, 22 February 2018 18:01 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
this is a combination of an ambigous grammar and a buggy serializer generator.
if the serializer generator stumbles over the ambigous grammar i cannot tell

=> you should solve the ambiguities first and add stuff step by step to
Predicate




Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782408 is a reply to message #1782407] Thu, 22 February 2018 18:02 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
and you may use e.g.antlrworks to solve ambiguities.
unfortualtely the work to alalyze this goes beyond what i can do in my spare time


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782410 is a reply to message #1782408] Thu, 22 February 2018 18:24 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14661
Registered: July 2009
Senior Member
to skip the serializer and see the other errors

 // inside language
			serializer = org.xtext.example.mydsl.NullSerializerFragment2 {
				
			}


package org.xtext.example.mydsl;

import org.eclipse.xtext.xtext.generator.serializer.SerializerFragment2;

public class NullSerializerFragment2 extends SerializerFragment2 {
	
	@Override
	public void generate() {
		// do nothing
	}

}


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Unable to generate parser for a big grammar [message #1782415 is a reply to message #1782410] Thu, 22 February 2018 20:03 Go to previous messageGo to next message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
Thanks! I've got it. The problem appears during serializer generation, not parser generation.

There are a lot of rules in my grammar following the pattern:
CommonValueExpression returns ValueExpression:
    NumericValueExpression |
    CharacterValueExpression;



NumericValueExpression returns ValueExpression:
    AdditiveExpression;

AdditiveExpression returns ValueExpression:
    MultiplicativeExpression
    ({AdditionExpression.left=current} '+' right=MultiplicativeExpression)*
    ({SubtractionExpression.left=current} '-' right=MultiplicativeExpression)*;

MultiplicativeExpression returns ValueExpression:
    UnarySignExpression
    ({MultiplicationExpression.left=current} '*' right=UnarySignExpression)*
    ({DivisionExpression.left=current} '/' right=UnarySignExpression)*;

UnarySignExpression returns ValueExpression:
    NumericPrimary |
    {UnaryPlusExpression} '+' expr=UnarySignExpression |
    {UnaryMinusExpression} '-' expr=UnarySignExpression;

NumericPrimary returns ValueExpression:
    ValueExpressionPrimary |
    NumericValueFunction;



CharacterValueExpression:
    Concatenation;

Concatenation:
    CharacterFactor ({Concatenation.left=current} '||' right=CharacterFactor)*;

CharacterFactor:
    expr=CharacterPrimary ('COLLATE' collationName=CollationName)?;

CharacterPrimary:
    ValueExpressionPrimary |
    StringValueFunction;



ValueExpressionPrimary returns ValueExpression:
    '(' ValueExpression ')' |
    NonparenthesizedValueExpressionPrimary;

NonparenthesizedValueExpressionPrimary:
    UnsignedValueSpecification;

UnsignedValueSpecification:
    UnsignedLiteral;


ValueExpressionPrimary is reachable from CommonValueExpression through NumericValueExpression and CharacterValueExpression. I don't understand how to apply left-factoring or syntactic predicates for these rules.

The following rules are wrong:
CommonValueExpression returns ValueExpression:
    =>ValueExpressionPrimary |
    NumericValueExpression |
    CharacterValueExpression;

CommonValueExpression returns ValueExpression:
    =>NumericValueExpression |
    CharacterValueExpression;
Re: Unable to generate parser for a big grammar [message #1782451 is a reply to message #1782415] Fri, 23 February 2018 15:13 Go to previous message
Denis Nikiforov is currently offline Denis NikiforovFriend
Messages: 343
Registered: August 2013
Senior Member
I think I should determine precedence of numeric, string, ... operations. And make a single hierarchy of rules.
For example string expressions has a lower precedence. So CommonValueExpression must refer CharacterValueExpression rule. The later one must refer NumericValueExpression. And the later one must refer ValueExpressionPrimary.
This will disambiguate the grammar.

The problem is that precedence of operations is unclear in the original BNF grammar. Also it differs in diferent dialects of the original language. Oh, SQL is the most complicated language in the universe.
Previous Topic:Expression supertype doesn't work properly
Next Topic:Custom Lexer and Semantic Predicates issue
Goto Forum:
  


Current Time: Fri Mar 29 15:56:33 GMT 2024

Powered by FUDForum. Page generated in 0.23286 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top