Eclipse Community Forums: TMF (Xtext) » How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar?

Home » Modeling » TMF (Xtext) » How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar?

How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1776902]

Wed, 22 November 2017 10:57

Eclipse User

I am doing some experiment about how to use xtext to parse XML, since I want to use XML to represent AST instead of pure text.

But I meet some issue about STRING terminal conversion.

For instance, I want to parse this xml node:

    <MyEnum name = "TestEnum">
        <MyEnumLiteral name = "Unknown" value = "-1" />
        <MyEnumLiteral name = "First" value = "0"/>
        <MyEnumLiteral name = "Second" value = "1"/>
    </MyEnum >

My expectation is the value attribute is int.

The grammar is :

MyEnum:
    '<MyEnum' 'name' '=' name=STRING '>' 
        literals += MyEnumLiteral*
    '</MyEnum>'
;

MyEnumLiteral:
    '<MyEnumLiteral' 'name' '=' name=STRING  ('value' '=' value=STRING)?  '/>' 
;

After running mwe2 workflow, the generated class MyEnumLiteral contains two non-expected interfaces:

  String getValue(); // <---- It is String, my expectationint is int...
  void setValue(String value);

So, how to specify an underlying data type in grammar?
When parsing the terminal STRING, it could automatically convert the STRING to ecore::EInt? (If convert failed, the entire parsing process fails and report error.That is OK.)

I hope xtext provide some feature like: {ecore::EInt}, for instance:

MyEnumLiteral:
    '<MyEnumLiteral' 'name' '=' name=STRING  ('value' '=' value={ecore::EInt} STRING)?  '/>'

Very thanks.

[Updated on: Wed, 22 November 2017 11:14] by Moderator

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1776903 is a reply to message #1776902]

Wed, 22 November 2017 11:09

Eclipse User

is having: value=STRING just for syntax? if no change it to value=INT
if yes change it to

import "http://www.eclipse.org/emf/2002/Ecore" as ecore

Model:
	value=STRING_THAT_IS_ACTUALLY_AN_INT
;

STRING_THAT_IS_ACTUALLY_AN_INT returns ecore::EInt:
	STRING
;

and implement a value converter

//new

public class MyVC extends DefaultTerminalConverters {
	
	@Inject
	private MyINTValueConverter myINTValueConverter;
	
	@ValueConverter(rule = "STRING_THAT_IS_ACTUALLY_AN_INT")
	public IValueConverter<Integer> STRING_THAT_IS_ACTUALLY_AN_INT() {
		return myINTValueConverter;
	}
	
	// TODO make this better
	public static class MyINTValueConverter implements IValueConverter<Integer> {

		@Override
		public Integer toValue(String string, INode node) throws ValueConverterException {
			if (string==null) {
				throw new ValueConverterException("Couldn't convert '" + string + "' to an int value.", node, null);
			}
			string = string.trim();
			if (string.length()<3) {
				throw new ValueConverterException("Couldn't convert '" + string + "' to an int value.", node, null);
			}
			try {
				int intValue = Integer.parseInt(string.substring(1, string.length()-1), 10);
				return Integer.valueOf(intValue);
			} catch (NumberFormatException e) {
				throw new ValueConverterException("Couldn't convert '" + string + "' to an int value.", node, e);
			}
		}

		@Override
		public String toString(Integer value) throws ValueConverterException {
			return "\""+value+"\"";
		}
		
	}

}

// adapt

class MyDslRuntimeModule extends AbstractMyDslRuntimeModule {
	
	override bindIValueConverterService() {
		MyVC
	}
	
}

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1776908 is a reply to message #1776903]

Wed, 22 November 2017 11:43

Eclipse User

It works.
Christian, thank you very much!

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777291 is a reply to message #1776908]

Tue, 28 November 2017 03:35

Eclipse User

Why not use EMF directly to parse XML?

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777306 is a reply to message #1777291]

Tue, 28 November 2017 04:57

Eclipse User

Hi

Indeed EMF will do the parser for free. Re-implementing XML parsing is not particularly easy.

However autogenerating a parser from its Ecore metamodel is a very promising research direction that should enable standard EMF models to be loaded significantly (perhaps 3-fold) quicker and facilitate non-standard in memory representations that can be very beneficial for model to model transformation. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=507391

I've had a first play with this and it wasn't as easy as I hoped. Custom lexers, arbitrary ordering of elements and the complexities of xmi:ids were not particular compatible with conflict free Xtext. But once I exploited my Xtext2LPG transformation the resulting *.ecore grammar is about 350 states.

IIRC my play ended when I hit the extensible challenge. An actual XML is eXtensible, A given Ecore metamodel is not and so many of the benefits of an optimized autogenerated parser are undermined if it is extensible.

If you're interested in seeing where I got to, I can create you a ZIP of what is too preliminary for GIT.

Regards

Ed Willink

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777314 is a reply to message #1777306]

Tue, 28 November 2017 06:22

Eclipse User

Generating a parser seems to me more likely to be a relatively pointless exercise that in the end produces bloated byte code that does not perform significantly better than is possible with a generic approach. It's not as if a whole heck of a lot of work hasn't gone into making very fast SAX-parsers. A generic approach still based on SAX could perform significantly better, as I experimented with in https://bugs.eclipse.org/bugs/show_bug.cgi?id=51210#c5 some time ago. The irony back then was that all the groups complaining about performance at my employer at the time were not actually interested in improved performance, they were primarily interested in have a scapegoat for all their problems and were definitely not interested in being without such a scapegoat.

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777395 is a reply to message #1777291]

Wed, 29 November 2017 02:57

Eclipse User

Jan Koehnlein wrote on Tue, 28 November 2017 08:35

Why not use EMF directly to parse XML?

I know that xtext can generate the ecore model (model/generated/MyDsl.ecore) for xtext grammar automatically.
We can load ecore model instance by register ecore model or embed schemaLocation to ecore model instance file.

I use xtext to parse xml for two main reasons:
(1) I want to be able to edit xml (as DSL code) in xtext generated IDE. This maximizes the use of xtext's Validation and Content Assist.
Although some xml editors (eg: oxygen xml editor) also supports Content Assist, but not so perfect as xtext IDE (some feature like scoping maybe need an extra plugin).

(2) This experiment is only for my personal interest, not company project.
I see that:
xtext grammar == XML Schema / RelaxNG (Compact verion)
xtext validation == XML Schematron
xtext IDE == XML Editor (eg: oxygen xml editor)
xtext generator/Interpreter == XML XSLT
edit dsl code == edit XML instance as dsl AST node
So what I want to do is to write two piece of code, one is for xtext, another is for xml and compare their differences. (maybe the third for lisp-expression) These are all for learning purpose.

In this case, the generated MyDsl.ecore or MyDsl.xsd (.ecore can be converted to .xsd) is not very useful for me.
And, the generated MyDsl.xsd is strongly dependent on ecore, that is not pure xsd. (as I mentioned before, I consider xsd as type2 grammar)
It also won't automatically generate schematron validation file for me.

[Updated on: Wed, 29 November 2017 10:28] by Moderator

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777401 is a reply to message #1777306]

Wed, 29 November 2017 03:15

Eclipse User

Ed Willink wrote on Tue, 28 November 2017 09:57

Hi

Indeed EMF will do the parser for free. Re-implementing XML parsing is not particularly easy.

However autogenerating a parser from its Ecore metamodel is a very promising research direction that should enable standard EMF models to be loaded significantly (perhaps 3-fold) quicker and facilitate non-standard in memory representations that can be very beneficial for model to model transformation. See https://bugs.eclipse.org/bugs/show_bug.cgi?id=507391

Hi, Willink,

In your context, what is Ecore metamodel ? and What is a parser from its Ecore metamodel?

In general, we define the ecore model (the instance of ecore metamodel), not ecore metamodel.
The ecore metamodel is some class like: EClass, EEnum, EAtribute EReference, ...
Even in a DSL, we have MyClass, MyEnum, MyField,.. those are all ecore model (the instance of ecore metamodel).

[Updated on: Wed, 29 November 2017 03:16] by Moderator

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777407 is a reply to message #1777401]

Wed, 29 November 2017 04:21

Eclipse User

Hi

Ecore is its own metamodel.

An autogenerated parser for Ecore has keywords such as "<xml", "<ePackage", "xmi" which a tokenizer converts direct to integers that a large state table can dispatch in accordance with the LALR analysis. The parser can therefore work using substrings, integers, states and tables using compile-time knowledge. How much speed advantage/size penalty this gives over SAX that builds an intermediate set of objects is something of a matter of guesswork and personal belief until a prototype has been built and instrumented.

Regards

Ed Willink

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777412 is a reply to message #1777407]

Wed, 29 November 2017 04:50

Eclipse User

Ed Willink wrote on Wed, 29 November 2017 09:21

Hi

Ecore is its own metamodel.

An autogenerated parser for Ecore has keywords such as "<xml", "<ePackage", "xmi" which a tokenizer converts direct to integers that a large state table can dispatch in accordance with the LALR analysis. The parser can therefore work using substrings, integers, states and tables using compile-time knowledge. How much speed advantage/size penalty this gives over SAX that builds an intermediate set of objects is something of a matter of guesswork and personal belief until a prototype has been built and instrumented.

Regards

Ed Willink

I understand what you said.
That is great!

Is the autogenerated parser generated from xtext?
If it is, we can use xtext IDE to edit ecore model xml directly, and maximizes the use of xtext's Validation and Content Assist.

I am very interesting, could you create a ZIP for me to learn?
serioadamo97@gmail.com
Very thanks.

[Updated on: Wed, 29 November 2017 04:52] by Moderator

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777432 is a reply to message #1777412]

Wed, 29 November 2017 06:05

Eclipse User

Hi

The relevant projects from my (OCL) workspace are contained in https://www.dropbox.com/s/dxyhgttksjhr1xy/Ecore2LPG.zip?dl=0

The content is what I find now after I stopped playing in April last year.

xtext2lpg is a half-developed tool that I have used a few times to enable me to check that an Xtext grammar is free of ambiguities. It converts a *.xtext file to a *KWLexer/Lexer/Parser.gi triple that can be processed by the LPG LALR parser. Currently grammar only; action code to do.

ecore2xtext is the auto-generated parser play. It converts a *.ecore file to a *.xtext parser that can then be converted to LPG files by xtext2lpg. Of course I use Ecore.ecore as the first test vehicle.

The other ecore2xtext.* projects are probably just added-value Xtext autogeneration that is not needed if Ecore=>Xtext=>LPG is a purely model-based activity. Ignore all *OCL* files.

The Ecore2Xtext*.gi files in org.eclipse.ocl.examples.xtext2lpg/lpg-gen probably represent the limits of my achievements. Possibly ready for action code auto-generation. IIRC something was not as smooth as I hoped. Possibly the number of bloated actions to avoid conflicts. Possibly an issue with extensibility. Possibly a need to study the String class carefully to see how to optimize the use of sub-strings without creating unnecessary String objects.

The "Generate LPG grammar for Ecore2Xtext.launch" is probably the intended way to do something.

To use LPG, you need to install it; not particularly easy. You might try GIT\org.eclipse.ocl\releng\org.eclipse.ocl.releng\psfs\ocl-lpg.psf . IIRC I had to manually edit the 32 bit Windows config into a 64 bit Windows config. Once you get it installed

GIT\org.eclipse.ocl\plugins\org.eclipse.ocl\.settings\LPG2 selected-file.launch

should allow you to select a *.gi file and just launch the external tool for it. I forget whether the console or the *.l file is more trustworthy when it doesn't work perfectly.

If after studying the ZIP contents, you want to take this further please email me directly ed_at_willink.me.uk. Could make for a good joint paper at BigMDE.

Regards

Ed Willink

Re: How to convert STRING terminal to ecore::EInt when parse XML using xtext grammar? [message #1777930 is a reply to message #1777432]

Wed, 06 December 2017 05:13

Eclipse User

Hi

You motivated me to have a further play. A mostly working alternative Ecore parser can be found in the ewillink/528050 branch of the Eclipse OCL GIT.

It appears that a non-standard metamodel-driven LALR parser can load an Ecore file perhaps two times faster than the standard EMF SAXParser. There are opportunities for further improvement but I doubt that a further factor of two is available.

BUT much more significant is the comparison between cold parsing (first use with a new JVM) and warm (after many hundreds of repeated usages). For both approaches, a cold parse is at least 50 times slower than warm.

Since the standard parser has diverse usage, it is likely to be warmed up anyway. A non-standard parser is therefore only worth consideration for huge files or many files with the same metamodel or a non-standard memory representation.

Regards

Ed Willink

Previous Topic:	Formatting list of strings?
Next Topic:	Xtend generated code uses Java8 API's even though the project is still JavaSE-1.7

Goto Forum:

-=] Back to Top [=-

Current Time: Sun Jul 27 12:25:18 EDT 2025

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter