Home » Modeling » TMF (Xtext) » Xtext can not handle parsing big files
Xtext can not handle parsing big files [message #1460315] |
Mon, 03 November 2014 12:36 |
Lidia Gutu Messages: 45 Registered: July 2013 |
Member |
|
|
I have posted this questions few months ago, and I am asking again, as unfortunately I didn't found any solution.
How to create a dsl language that will be able to handle big files, about 10 MB.
How to create a lighweight editor or something.
As with my current version, typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes eclipse freezes or it also goes to OutOfMemory error with -Xmx1024m and -XX:MaxPermSize=512m .
Is there a way to create a dsl language that will be able to handle this, or it is a bug in xtext.
I have investigated this from different points of view.
1. Tried to check and change grammar.
2. Tried to create a lightweight editor by adding different modules in *.mwe2
where all memory consuming actions should be turned off, like outline refresh, content assistent
3. Debugged the xtext code, and found the piece of code that causes the delay during typing, was thinking of patching xtext plugins.
4. Tried to customize my dsl plugins, by making my dsl not depend on JDT
NOTHING HELPED SO FAR !!!!!!!!!!!!!!!!!
----------------------- 1. GRAMMAR --------------------------------
I have a complex grammar, that does not contains left recursive rues. So it seems to be ok, only some parts are too long. Should I split them somehow?
some samples and print screens attached:
AndExpression returns ConstantExpression
: ShiftExpression ({AndExpression.left=current} "&" right=ShiftExpression)*
;
AdditiveExpression returns ConstantExpression
: MultiplicativeExpression ({AdditiveExpression.left=current} additionKind+=AdditionKind right=MultiplicativeExpression)*
;
EnumInt returns EnumDefinition
: "enum" "{" enumList+=EnumItem ("," enumList+=EnumItem)* "}"
name=DSLID/(withString?="string")?
;
FuntionParameter
: funtionKind=FunctionKind type=DSLType (byKey?="bykey")? (byDefinition?="bydefinition")?
(parametername=DSLRelaxedID)? (dimensions+=Dimension)* (noTrace?="notrace")? ";"
;
My grammar has 812 lines, and I do not get warnings as it has Left Recursive rules. What other optimization can be used for a xtext grammar?
//---------------------2. HOW TO CREATE XTEXT workflow that will avoid triggering different listeners and observers that are time and memory consuming in main thread?-----------------------
My last version used
// Java API to access grammar elements (required by several other fragments)
fragment = grammarAccess.GrammarAccessFragment auto-inject {}
// generates Java API for the generated EPackages
fragment = ecore.EMFGeneratorFragment auto-inject {}
// Serializer 2.0
fragment = serializer.SerializerFragment auto-inject {
generateStub = false
}
// a custom ResourceFactory for use with EMF
fragment = resourceFactory.ResourceFactoryFragment auto-inject {
fileExtensions = file.extensions
}
// The antlr parser generator fragment.
fragment = parser.antlr.ex.rt.AntlrGeneratorFragment auto-inject {
options = {
ignoreCase = true
}
}
// generates a more lightweight Antlr parser and lexer tailored ...
fragment = parser.antlr.ex.ca.ContentAssistParserGeneratorFragment auto-inject {
options = {
ignoreCase = true
}
}
// java-based API for validation
fragment = validation.JavaValidatorFragment auto-inject {
// composedCheck = "org.eclipse.xtext.validation.ImportUriValidator"
composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
}
// scoping and exporting API
fragment = scoping.ImportNamespacesScopingFragment auto-inject {
ignoreCase = true
}
//HAMI: Use simple names
//fragment = exporting.QualifiedNamesFragment {}
fragment = builder.BuilderIntegrationFragment auto-inject {}
// generator API
fragment = generator.GeneratorFragment auto-inject {
generateMwe = true
generateJavaMain = true
// generatorStub = true
}
// formatter API
fragment = formatting.FormatterFragment auto-inject {}
// content assist API
fragment = contentAssist.JavaBasedContentAssistFragment auto-inject {}
// rename refactoring
fragment = refactoring.RefactorElementNameFragment auto-inject {}
//-------------------- 3. XTEXT plugins ------------------------------
I am using xtext 2.5.2. , also tested with newer version and didn't found any differences.
The issues happens in XtextDocument class, fireDocumentChanged method:
@Override
protected void fireDocumentChanged(DocumentEvent event) {
tokenSource.updateStructure(event); // if I comment this line, I do not fave syntax coloring and typing is very fast
super.fireDocumentChanged(event);
}
If I use 'tokenSource.updateStructure(event)' it populates the list of tokens, and a lot of observers are triggered during any character typing, and the whole tokens list it traversed (in my case it has 2 235 282 size).
org.eclipse.xtext.ui.editor.syntaxcoloring.TokenScanner
org.eclipse.xtext.ui.editor.modelPartitionTokenScanner
org.eclipse.xtext.ui.editor.PresentationDamagercomputeInterSection
Can someone help me in what direction to continue my investigation.
Is there a way to handle it from *.mwe2 or I should customize my dsl plugins.
Is there a possibility to have something wrong in my grammar when it has long constructions.
Thank you in advance
Lidia
[Updated on: Mon, 03 November 2014 12:49] Report message to a moderator
|
|
|
Re: Xtext can not handle parsing big files [message #1460342 is a reply to message #1460315] |
Mon, 03 November 2014 13:11 |
Ed Willink Messages: 7655 Registered: July 2009 |
Senior Member |
|
|
Hi
For the Essential OCL grammar which had many similarities to yours:
a) I converted the Xtext grammar to LALR so that I could see whether
backtracking was being used to resolve shift-reduce conflicts. (I
recently revised org.eclipse.ocl.examples.xtext2lpg and it now generates
conflict free LPG grammars from Xtext.) This exercise was very helpful
in identifying inferior grammar design.
In the case of multi-term expressions the backtracking could cause
moderate expressions to take many many seconds to parse. After
restructuring parsing is fast.
b) I avoided built-in precedence such as yours by instead just parsing a
list of interleaved expressions/operators which I built in to a tree
later, with the additional benefit that the grammar is extensible and
precedence is modeled rather than built-in.
Your "ShiftExpression ({AndExpression.left=current} "&"
right=ShiftExpression)*" right recursion makes me very suspicious.
Ultimately, there is no substitute for instrumentation.
https://bugs.eclipse.org/bugs/show_bug.cgi?id=401953 gives some clues as
to how I instrumented.
org.eclipse.ocl.examples.test.xtext.LoadTests.testLoad_Bug401953_essentialocl()
remains part of the test suite so that any regression to exponential
parsing time is detected.
Regards
Ed Willink
On 03/11/2014 12:36, Lidia Gutu wrote:
> I have posted this questions few months ago, and I am asking again, as unfortunately I didn't found any solution.
>
> How to create a dsl language that will be able to handle big files, about 10 MB.
> In my case , typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes workbench freezes.
> How to create a lighweight editor or something.
> As with my current version, typing is very slow for files with more than 8MB especially at the end of the document. It takes some time to open this huge documents, sometimes workbench freezes or it also goes to OutOfMemory error with -Xmx1024m and -XX:MaxPermSize=512m .
>
> Is there a way to create a dsl language that will be able to handle this, or it is a bug in xtext.
>
> I have investigated this from different points of view.
> 1. Tried to check and change grammar.
> 2. Tried to create a lightweight editor by adding different modules in *.mwe2
> where all memory consuming actions should be turned off, like outline refresh, content assistent
> 3. Debugged the xtext code, and found the piece of code that causes the delay during typing, was thinking of patching xtext plugins.
> 4. Tried to customize my dsl plugins, by making my dsl not depend on JDT
>
> NOTHING HELPED SO FAR !!!!!!!!!!!!!!!!!
>
> ----------------------- 1. GRAMMAR --------------------------------
> I have a complex grammar, that does not contains left recursive rues. So it seems to be ok, only some parts are too long. Should I split them somehow?
>
> some samples and print screens attached:
>
> AndExpression returns ConstantExpression
> : ShiftExpression ({AndExpression.left=current} "&" right=ShiftExpression)*
> ;
>
> AdditiveExpression returns ConstantExpression
> : MultiplicativeExpression ({AdditiveExpression.left=current} additionKind+=AdditionKind right=MultiplicativeExpression)*
> ;
>
> EnumInt returns EnumDefinition
> : "enum" "{" enumList+=EnumItem ("," enumList+=EnumItem)* "}"
> name=DSLID/(withString?="string")?
> ;
>
> FuntionParameter
> : funtionKind=FunctionKind type=DSLType (byKey?="bykey")? (byDefinition?="bydefinition")?
> (parametername=DSLRelaxedID)? (dimensions+=Dimension)* (noTrace?="notrace")? ";"
> ;
>
> My grammar has 812 lines, and I do not get warnings as it has Left Recursive rules. What other optimization can be used for a xtext grammar?
>
> //---------------------2. HOW TO CREATE XTEXT workflow that will avoid triggering different listeners and observers that are time and memory consuming in main thread?-----------------------
>
> My last version used
> // Java API to access grammar elements (required by several other fragments)
> fragment = grammarAccess.GrammarAccessFragment auto-inject {}
>
> // generates Java API for the generated EPackages
> fragment = ecore.EMFGeneratorFragment auto-inject {}
>
> // Serializer 2.0
> fragment = serializer.SerializerFragment auto-inject {
> generateStub = false
> }
>
> // a custom ResourceFactory for use with EMF
> fragment = resourceFactory.ResourceFactoryFragment auto-inject {
> fileExtensions = file.extensions
> }
>
> // The antlr parser generator fragment.
> fragment = parser.antlr.ex.rt.AntlrGeneratorFragment auto-inject {
> options = {
> ignoreCase = true
> }
> }
>
> // generates a more lightweight Antlr parser and lexer tailored ...
> fragment = parser.antlr.ex.ca.ContentAssistParserGeneratorFragment auto-inject {
> options = {
> ignoreCase = true
> }
> }
> // java-based API for validation
> fragment = validation.JavaValidatorFragment auto-inject {
> // composedCheck = "org.eclipse.xtext.validation.ImportUriValidator"
> composedCheck = "org.eclipse.xtext.validation.NamesAreUniqueValidator"
> }
>
> // scoping and exporting API
> fragment = scoping.ImportNamespacesScopingFragment auto-inject {
> ignoreCase = true
> }
> //HAMI: Use simple names
> //fragment = exporting.QualifiedNamesFragment {}
> fragment = builder.BuilderIntegrationFragment auto-inject {}
>
> // generator API
> fragment = generator.GeneratorFragment auto-inject {
> generateMwe = true
> generateJavaMain = true
> // generatorStub = true
> }
>
> // formatter API
> fragment = formatting.FormatterFragment auto-inject {}
> // content assist API
> fragment = contentAssist.JavaBasedContentAssistFragment auto-inject {}
>
> // rename refactoring
> fragment = refactoring.RefactorElementNameFragment auto-inject {}
>
>
> //-------------------- 3. XTEXT plugins ------------------------------
> I am using xtext 2.5.2. , also tested with newer version and didn't found any differences.
> The issues happens in XtextDocument class, fireDocumentChanged method:
> @Override
> protected void fireDocumentChanged(DocumentEvent event) {
> tokenSource.updateStructure(event); // if I comment this line, I do not fave syntax coloring and typing is very fast
> super.fireDocumentChanged(event);
> }
>
>
> If I use 'tokenSource.updateStructure(event)' it populates the list of tokens, and a lot of observers are triggered during any character typing, and the whole tokens list it traversed (in my case it has 2 235 282 size).
> org.eclipse.xtext.ui.editor.syntaxcoloring.TokenScanner
> org.eclipse.xtext.ui.editor.modelPartitionTokenScanner
> org.eclipse.xtext.ui.editor.PresentationDamagercomputeInterSection
>
>
> Can someone help me in wnat direction to continue my investigation.
> Is there a way to handle it from *.mwe2 or I should customize my dsl plugins.
> Is there a possibility to have something wrong in my grammar when it has long constructions.
>
> Thank you in advance
> Lidia
>
>
>
>
>
>
|
|
| | | |
Goto Forum:
Current Time: Sat Apr 27 02:11:14 GMT 2024
Powered by FUDForum. Page generated in 0.02605 seconds
|