Eclipse Community Forums: TMF (Xtext) » Xtext can not handle parsing big files

Help

Home

Home » Modeling » TMF (Xtext) » Xtext can not handle parsing big files

Show: Today's Messages :: Show Polls :: Message Navigator

Xtext can not handle parsing big files [message #1460315]

Mon, 03 November 2014 12:36

Lidia Gutu

Messages: 45
Registered: July 2013

Member

I have posted this questions few months ago, and I am asking again, as unfortunately I didn't found any solution.

How to create a dsl language that will be able to handle big files, about 10 MB.
How to create a lighweight editor or something.
As with my current version, typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes eclipse freezes or it also goes to OutOfMemory error with -Xmx1024m and -XX:MaxPermSize=512m .

Is there a way to create a dsl language that will be able to handle this, or it is a bug in xtext.

I have investigated this from different points of view.
1. Tried to check and change grammar.
2. Tried to create a lightweight editor by adding different modules in *.mwe2
where all memory consuming actions should be turned off, like outline refresh, content assistent
3. Debugged the xtext code, and found the piece of code that causes the delay during typing, was thinking of patching xtext plugins.
4. Tried to customize my dsl plugins, by making my dsl not depend on JDT

NOTHING HELPED SO FAR !!!!!!!!!!!!!!!!!

----------------------- 1. GRAMMAR --------------------------------
I have a complex grammar, that does not contains left recursive rues. So it seems to be ok, only some parts are too long. Should I split them somehow?

some samples and print screens attached:

AndExpression returns ConstantExpression
   : ShiftExpression ({AndExpression.left=current} "&" right=ShiftExpression)*
   ;
   
AdditiveExpression returns ConstantExpression
   : MultiplicativeExpression ({AdditiveExpression.left=current} additionKind+=AdditionKind right=MultiplicativeExpression)*
   ;

EnumInt returns EnumDefinition
   : "enum" "{" enumList+=EnumItem ("," enumList+=EnumItem)* "}" 
      name=DSLID/(withString?="string")?
   ;

FuntionParameter 
   : funtionKind=FunctionKind type=DSLType (byKey?="bykey")? (byDefinition?="bydefinition")?
      (parametername=DSLRelaxedID)? (dimensions+=Dimension)* (noTrace?="notrace")? ";"
   ;

My grammar has 812 lines, and I do not get warnings as it has Left Recursive rules. What other optimization can be used for a xtext grammar?

//---------------------2. HOW TO CREATE XTEXT workflow that will avoid triggering different listeners and observers that are time and memory consuming in main thread?-----------------------

My last version used

// Java API to access grammar elements (required by several other fragments)
                fragment = grammarAccess.GrammarAccessFragment auto-inject {}
    
                // generates Java API for the generated EPackages
                fragment = ecore.EMFGeneratorFragment auto-inject {}    
    
                // Serializer 2.0
                fragment = serializer.SerializerFragment auto-inject {
                  generateStub = false
                }
                
                // a custom ResourceFactory for use with EMF
                fragment = resourceFactory.ResourceFactoryFragment auto-inject {
                    fileExtensions = file.extensions
                }
    
                  // The antlr parser generator fragment.
                  fragment = parser.antlr.ex.rt.AntlrGeneratorFragment auto-inject {
                     options = {
                        ignoreCase = true
                     }
                  }
    
                // generates a more lightweight Antlr parser and lexer tailored ...
                     fragment = parser.antlr.ex.ca.ContentAssistParserGeneratorFragment auto-inject {
                        options = {
                            ignoreCase = true
                     }
                 }
                // java-based API for validation
                fragment = validation.JavaValidatorFragment  auto-inject {
                //    composedCheck = "org.eclipse.xtext.validation.ImportUriValidator"
                  composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
                }
    
                // scoping and exporting API
                fragment = scoping.ImportNamespacesScopingFragment auto-inject {
                   ignoreCase = true                   
                }
                //HAMI: Use simple names
                //fragment = exporting.QualifiedNamesFragment {}
                fragment = builder.BuilderIntegrationFragment auto-inject {}
    
                // generator API
                fragment = generator.GeneratorFragment auto-inject {
                  generateMwe = true
                  generateJavaMain = true
//                  generatorStub = true
                }
    
                // formatter API
                fragment = formatting.FormatterFragment auto-inject {}
               // content assist API
                fragment = contentAssist.JavaBasedContentAssistFragment auto-inject {}
    
                // rename refactoring
                fragment = refactoring.RefactorElementNameFragment auto-inject {}

//-------------------- 3. XTEXT plugins ------------------------------
I am using xtext 2.5.2. , also tested with newer version and didn't found any differences.
The issues happens in XtextDocument class, fireDocumentChanged method:

@Override
	protected void fireDocumentChanged(DocumentEvent event) {
		tokenSource.updateStructure(event);  // if I comment this line, I do not fave syntax coloring  and typing is very fast
		super.fireDocumentChanged(event);
	}

If I use 'tokenSource.updateStructure(event)' it populates the list of tokens, and a lot of observers are triggered during any character typing, and the whole tokens list it traversed (in my case it has 2 235 282 size).
org.eclipse.xtext.ui.editor.syntaxcoloring.TokenScanner
org.eclipse.xtext.ui.editor.modelPartitionTokenScanner
org.eclipse.xtext.ui.editor.PresentationDamagercomputeInterSection

Can someone help me in what direction to continue my investigation.
Is there a way to handle it from *.mwe2 or I should customize my dsl plugins.
Is there a possibility to have something wrong in my grammar when it has long constructions.

Thank you in advance
Lidia

Attachment: Screenshot from 2014-11-03 14:02:49.png
(Size: 6.05KB, Downloaded 134 times)
Attachment: Screenshot from 2014-11-03 14:03:42.png
(Size: 5.75KB, Downloaded 134 times)

[Updated on: Mon, 03 November 2014 12:49]

Report message to a moderator

Re: Xtext can not handle parsing big files [message #1460342 is a reply to message #1460315]

Mon, 03 November 2014 13:11

Ed Willink

Messages: 7655
Registered: July 2009

Senior Member

Hi

For the Essential OCL grammar which had many similarities to yours:

a) I converted the Xtext grammar to LALR so that I could see whether
backtracking was being used to resolve shift-reduce conflicts. (I
recently revised org.eclipse.ocl.examples.xtext2lpg and it now generates
conflict free LPG grammars from Xtext.) This exercise was very helpful
in identifying inferior grammar design.

In the case of multi-term expressions the backtracking could cause
moderate expressions to take many many seconds to parse. After
restructuring parsing is fast.

b) I avoided built-in precedence such as yours by instead just parsing a
list of interleaved expressions/operators which I built in to a tree
later, with the additional benefit that the grammar is extensible and
precedence is modeled rather than built-in.

Your "ShiftExpression ({AndExpression.left=current} "&"
right=ShiftExpression)*" right recursion makes me very suspicious.

Ultimately, there is no substitute for instrumentation.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=401953 gives some clues as
to how I instrumented.

org.eclipse.ocl.examples.test.xtext.LoadTests.testLoad_Bug401953_essentialocl()
remains part of the test suite so that any regression to exponential
parsing time is detected.

Regards

Ed Willink

On 03/11/2014 12:36, Lidia Gutu wrote:
> I have posted this questions few months ago, and I am asking again, as unfortunately I didn't found any solution.
>
> How to create a dsl language that will be able to handle big files, about 10 MB.
> In my case , typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes workbench freezes.
> How to create a lighweight editor or something.
> As with my current version, typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes workbench freezes or it also goes to OutOfMemory error with -Xmx1024m and -XX:MaxPermSize=512m .
>
> Is there a way to create a dsl language that will be able to handle this, or it is a bug in xtext.
>
> I have investigated this from different points of view.
> 1. Tried to check and change grammar.
> 2. Tried to create a lightweight editor by adding different modules in *.mwe2
> where all memory consuming actions should be turned off, like outline refresh, content assistent
> 3. Debugged the xtext code, and found the piece of code that causes the delay during typing, was thinking of patching xtext plugins.
> 4. Tried to customize my dsl plugins, by making my dsl not depend on JDT
>
> NOTHING HELPED SO FAR !!!!!!!!!!!!!!!!!
>
> ----------------------- 1. GRAMMAR --------------------------------
> I have a complex grammar, that does not contains left recursive rues. So it seems to be ok, only some parts are too long. Should I split them somehow?
>
> some samples and print screens attached:
>
> AndExpression returns ConstantExpression
> : ShiftExpression ({AndExpression.left=current} "&" right=ShiftExpression)*
> ;
>
> AdditiveExpression returns ConstantExpression
> : MultiplicativeExpression ({AdditiveExpression.left=current} additionKind+=AdditionKind right=MultiplicativeExpression)*
> ;
>
> EnumInt returns EnumDefinition
> : "enum" "{" enumList+=EnumItem ("," enumList+=EnumItem)* "}"
> name=DSLID/(withString?="string")?
> ;
>
> FuntionParameter
> : funtionKind=FunctionKind type=DSLType (byKey?="bykey")? (byDefinition?="bydefinition")?
> (parametername=DSLRelaxedID)? (dimensions+=Dimension)* (noTrace?="notrace")? ";"
> ;
>
> My grammar has 812 lines, and I do not get warnings as it has Left Recursive rules. What other optimization can be used for a xtext grammar?
>
> //---------------------2. HOW TO CREATE XTEXT workflow that will avoid triggering different listeners and observers that are time and memory consuming in main thread?-----------------------
>
> My last version used
> // Java API to access grammar elements (required by several other fragments)
> fragment = grammarAccess.GrammarAccessFragment auto-inject {}
>
> // generates Java API for the generated EPackages
> fragment = ecore.EMFGeneratorFragment auto-inject {}
>
> // Serializer 2.0
> fragment = serializer.SerializerFragment auto-inject {
> generateStub = false
> }
>
> // a custom ResourceFactory for use with EMF
> fragment = resourceFactory.ResourceFactoryFragment auto-inject {
> fileExtensions = file.extensions
> }
>
> // The antlr parser generator fragment.
> fragment = parser.antlr.ex.rt.AntlrGeneratorFragment auto-inject {
> options = {
> ignoreCase = true
> }
> }
>
> // generates a more lightweight Antlr parser and lexer tailored ...
> fragment = parser.antlr.ex.ca.ContentAssistParserGeneratorFragment auto-inject {
> options = {
> ignoreCase = true
> }
> }
> // java-based API for validation
> fragment = validation.JavaValidatorFragment auto-inject {
> // composedCheck = "org.eclipse.xtext.validation.ImportUriValidator"
> composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
> }
>
> // scoping and exporting API
> fragment = scoping.ImportNamespacesScopingFragment auto-inject {
> ignoreCase = true
> }
> //HAMI: Use simple names
> //fragment = exporting.QualifiedNamesFragment {}
> fragment = builder.BuilderIntegrationFragment auto-inject {}
>
> // generator API
> fragment = generator.GeneratorFragment auto-inject {
> generateMwe = true
> generateJavaMain = true
> // generatorStub = true
> }
>
> // formatter API
> fragment = formatting.FormatterFragment auto-inject {}
> // content assist API
> fragment = contentAssist.JavaBasedContentAssistFragment auto-inject {}
>
> // rename refactoring
> fragment = refactoring.RefactorElementNameFragment auto-inject {}
>
>
> //-------------------- 3. XTEXT plugins ------------------------------
> I am using xtext 2.5.2. , also tested with newer version and didn't found any differences.
> The issues happens in XtextDocument class, fireDocumentChanged method:
> @Override
> protected void fireDocumentChanged(DocumentEvent event) {
> tokenSource.updateStructure(event); // if I comment this line, I do not fave syntax coloring and typing is very fast
> super.fireDocumentChanged(event);
> }
>
>
> If I use 'tokenSource.updateStructure(event)' it populates the list of tokens, and a lot of observers are triggered during any character typing, and the whole tokens list it traversed (in my case it has 2 235 282 size).
> org.eclipse.xtext.ui.editor.syntaxcoloring.TokenScanner
> org.eclipse.xtext.ui.editor.modelPartitionTokenScanner
> org.eclipse.xtext.ui.editor.PresentationDamagercomputeInterSection
>
>
> Can someone help me in wnat direction to continue my investigation.
> Is there a way to handle it from *.mwe2 or I should customize my dsl plugins.
> Is there a possibility to have something wrong in my grammar when it has long constructions.
>
> Thank you in advance
> Lidia
>
>
>
>
>
>

Report message to a moderator

Re: Xtext can not handle parsing big files [message #1461370 is a reply to message #1460342]

Tue, 04 November 2014 13:42

Lidia Gutu

Messages: 45
Registered: July 2013

Member

Thank you Ed for your replay.

I am trying to understand all what you meant.
I did not have time to invest to much time in xtext and dls languages, I am trying to change something created by someone else.

How I supposed to convert xtext grammar to LALR ?
Is it about changing the way of defining the rules in the grammar to have LALR, where I can find good samples?
I found few samples of grammar in org.eclipse.ocl , how do I know which one is a sample I need?

Thank you in advance
Lidia

.

Report message to a moderator

Re: Xtext can not handle parsing big files [message #1461407 is a reply to message #1461370]

Tue, 04 November 2014 14:25

Ed Willink

Messages: 7655
Registered: July 2009

Senior Member

Hi

Xtext (ANTLR) is an LL parser.

I am afraid that if you do not know the difference between LL and LALR
parsers I must just refer you to the literature; the dragon book / Wiki.

Regards

Ed Willink

On 04/11/2014 13:42, Lidia Gutu wrote:
> Thank you Ed for your replay.
>
> I am trying to understand all what you meant.
> I did not have time to invest to much time in xtext and dls languages, I
> am trying to change something created by someone else.
>
> How I supposed to convert xtext grammar to LALR ?
> Is it about changing the way of defining the rules in the grammar to
> have LALR, where I can find good samples?
> I found few samples of grammar in org.eclipse.ocl , how do I know which
> one is a sample I need?
>
> Thank you in advance
> Lidia
>
>

Report message to a moderator