Eclipse Community Forums: TMF (Xtext) » Extending grammars and runtime environments

Home » Modeling » TMF (Xtext) » Extending grammars and runtime environments(Looking to find out to extend a grammar, and create pre-defined runtime environment(s))

Show: Today's Messages :: Show Polls :: Message Navigator

Extending grammars and runtime environments [message #661700]

Sat, 26 March 2011 11:59

$Haravikk Kh\'arr is currently offline$ Haravikk Kh\'arr

Messages: 19
Registered: February 2011

Junior Member

Okay, I'm pretty new to Xtext, just started yesterday in fact, but have already made a lot more progress than I had trying to do all this manually, just wish the Eclipse plugin documentation mentioned Xtext so I could have found out about it sooner!

Anyway, I've built most of my grammar, but I have two main things that I'm unsure about:

Extending Grammars
Basically I've defined the language that I'm trying to work with, but I want to add some additional, editor-only keywords and features on top of it, think along the lines of import declarations, pragmas and so-on, which will exist in the editor but are not natively supported by the language I'm working with.

What is the best way to define such an extended grammar? And how/where do I place it within my project? I already have grammar for one file-extension, but I'd basically be adding a second grammar for a second extension, but want it to be included in the same project.

Also, when extending a grammar, is it possible to prevent the use of a construct, or will I need to create two separate branches in order to do-so?

Semi-related, but is there also a way to quickly define elements as depreciated, in my case my editor is intended to replace another, so I'd like to support its features but depreciate them in favour of replacement constructs.

Pre-defined constructs/Runtime Environment(s)
In my grammar I have a construct called an event, but only specific events are supported. Now, instead of putting these events into the grammar itself (messy), I've opted to do the following:

MyScript:
	(eventHandlers+=EventHandler)+;

Event:
	'event' name=ID '(' (parameters+=Parameter (',' parameters+=Parameter))* ')' ';';
	
EventHandler:
	name=[Event] '(' (parameters+=Parameter (',' parameters+=Parameter))* ')' '{'
	
	'}';

As you can hopefully see, while a basic script can include EventHandlers it can't define its own Events. I'm hoping that it'll be possible for me to create an extended "runtime" grammar which will describe the file(s) where the Event types are actually defined.

I'm wondering if this is the best way to do this, as the grammar of my scripts shouldn't change, but the events (and pre-defined functions) may change, so a "runtime" environment seems the best way to handle it rather than trying to shoe-horn everything into my grammar, but I'm curious as to whether there's a better mechanism.

Also, I'm not sure yet but is it possible to specify that the parameters of an EventHandler should match the types of those defined by the Event you're using, or is there a preferred way to handle that? As you can see the Event and EventHandler definitions are practically identical, except that only the EventHandler has content.

Report message to a moderator

Re: Extending grammars and runtime environments [message #661704 is a reply to message #661700]

Sat, 26 March 2011 13:04

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 3/26/11 12:59 PM, Haravikk wrote:
> Okay, I'm pretty new to Xtext, just started yesterday in fact, but have
> already made a lot more progress than I had trying to do all this
> manually, just wish the Eclipse plugin documentation mentioned Xtext so
> I could have found out about it sooner!
>
> Anyway, I've built most of my grammar, but I have two main things that
> I'm unsure about:
>
> Extending Grammars
> Basically I've defined the language that I'm trying to work with, but I
> want to add some additional, editor-only keywords and features on top of
> it, think along the lines of import declarations, pragmas and so-on,
> which will exist in the editor but are not natively supported by the
> language I'm working with.
>
What should happen when the user saves the text file? Do you need to
separate the extras from the basic, or do you want all the source text
in one file (i.e. in your slightly improved dialect of the language)?

> What is the best way to define such an extended grammar? And how/where
> do I place it within my project? I already have grammar for one
> file-extension, but I'd basically be adding a second grammar for a
> second extension, but want it to be included in the same project.
>
> Also, when extending a grammar, is it possible to prevent the use of a
> construct, or will I need to create two separate branches in order to
> do-so?
>
Do you need both dialects i.e files xxx.a and xxx.b to be editable?

Guessing that you want both - I would:
- have two separate models a and b (external to the grammars)
- have two separate grammar projects a.dsl, and b.dsl, where the b.dsl
overrides and extends the rules of the a.dsl.
- write validators for the a.dsl/model is such a way that they are
sensitive to the actual language '.a' or '.b' (you don't want the
stricter rules for '.a' to kick in when editing '.b'.

> Semi-related, but is there also a way to quickly define elements as
> depreciated, in my case my editor is intended to replace another, so I'd
> like to support its features but depreciate them in favour of
> replacement constructs.
>
There are different ways of doing this. You can handle deprecation
during validation - i.e. a Deprecated Warning (or Error if you like -
perhaps controlled via properties). This gives you the opportunity to
provide a quick-fix to replace the deprecated construct with how it is
supposed to be.

You can also handle deprecation with semantic styling if you want to
mark the text in some special way (overstrike, etc.) in addition to
showing a marker.

There is however nothing pre-defined for deprecation that I am aware of.

> Pre-defined constructs/Runtime Environment(s)
> In my grammar I have a construct called an event, but only specific
> events are supported. Now, instead of putting these events into the
> grammar itself (messy), I've opted to do the following:
>
> MyScript:
> (eventHandlers+=EventHandler)+;
>
> Event:
> 'event' name=ID '(' (parameters+=Parameter (',' parameters+=Parameter))*
> ')' ';';
>
> EventHandler:
> name=[Event] '(' (parameters+=Parameter (',' parameters+=Parameter))*
> ')' '{'
>
> '}';
>
> As you can hopefully see, while a basic script can include EventHandlers
> it can't define its own Events. I'm hoping that it'll be possible for me
> to create an extended "runtime" grammar which will describe the file(s)
> where the Event types are actually defined.
>
> I'm wondering if this is the best way to do this, as the grammar of my
> scripts shouldn't change, but the events (and pre-defined functions) may
> change, so a "runtime" environment seems the best way to handle it
> rather than trying to shoe-horn everything into my grammar, but I'm
> curious as to whether there's a better mechanism.
>
Separating them as you did is a good thing. If they are defined in the
grammar, the user will get a syntax error. When you are handling this in
a second step you can provide more meaningful messages and more easily
provide quick fixes.

There are two ways of handling the linking, either as you did via links
(which become non containment references in the model), and then you
need to present the model where all the predefined elements are
available to the runtime, or via validation of contained references (or
contained features depending on type). Which to use depends on how you
will be working with the models.

> Also, I'm not sure yet but is it possible to specify that the parameters
> of an EventHandler should match the types of those defined by the Event
> you're using, or is there a preferred way to handle that? As you can see
> the Event and EventHandler definitions are practically identical, except
> that only the EventHandler has content.
You can check this during validation, or control it via linking and
scoping (i.e. by controlling what the linker sees in a particular scope).

Hope that helps.
Regards
- henrik

Report message to a moderator

Re: Extending grammars and runtime environments [message #661707 is a reply to message #661704]

Sat, 26 March 2011 14:34

$Haravikk Kh\'arr is currently offline$ Haravikk Kh\'arr

Messages: 19
Registered: February 2011

Junior Member

Thanks for the excellent reply! Some further questions (likely me being inexperienced, though I've worked through most examples I could find and read a fair bit of documentation, in addition to the trusty Google searches =):

Henrik Lindberg wrote on Sat, 26 March 2011 09:04

What should happen when the user saves the text file? Do you need to separate the extras from the basic, or do you want all the source text in one file (i.e. in your slightly improved dialect of the language)?

Say my basic files have an extension of .msl, and my extended syntax an extension of .msle. Both file-types should be editable using the appropriate grammar, but .msle files will be translated into a .msl build product for use outside the editor.

Henrik Lindberg wrote on Sat, 26 March 2011 09:04

Guessing that you want both - I would:
- have two separate models a and b (external to the grammars)
- have two separate grammar projects a.dsl, and b.dsl, where the b.dsl overrides and extends the rules of the a.dsl.
- write validators for the a.dsl/model is such a way that they are sensitive to the actual language '.a' or '.b' (you don't want the stricter rules for '.a' to kick in when editing '.b'.

I'm not sure I quite follow the project structure you're proposing, are there examples of anything like this anywhere? Only tutorials and descriptions I seem to have found to follow are all for a single model, my end aim is to produce a single Eclipse plugin that handles my .msl, .msle, and any other types such as .mslr for a runtime perhaps, .mslm for modules etc.

Henrik Lindberg wrote on Sat, 26 March 2011 09:04

There are two ways of handling the linking, either as you did via links (which become non containment references in the model), and then you need to present the model where all the predefined elements are available to the runtime, or via validation of contained references (or contained features depending on type). Which to use depends on how you will be working with the models.

Well, part of my aim with my .msle type is to add modular construction of .msl files which don't themselves support imports. So I'll be providing the users with the ability to create modules (possibly with their own extension) with functions or events that can be joined together into a .msl file end product. I'm hoping I can handle both modules inclusion and runtime inclusion in a similar way, to avoid having to do too much duplicate work.

So a .msle file will be identical to a .msl file, except with the ability to declare directives and import bits of code, which will then be built into a plain .msl file by adding the imported code wherever appropriate, following directives to inline functions etc.

Henrik Lindberg wrote on Sat, 26 March 2011 09:04

You can check this during validation, or control it via linking and scoping (i.e. by controlling what the linker sees in a particular scope).

So the definition of EventHandler is correct in this case? Was mostly just hoping there might be an easy way to reduce the redundant parameters structure since it should match the corresponding Event that has been referenced, sounds simple enough, I've not gotten too far with the validator stuff yet, lots of fiddly grammar rules still to get right =)

Somewhat related, but I'm wondering if there's a best-practise for defining type-safe grammar rules. For example, say I support variables of type integer, and string, should I define a rule for each type of variable like:

Variable: StringVariable | IntegerVariable;
StringVariable: type='string' name=ID '=' stringValue=STRING ';';
IntegerVariable: type='integer' name=ID '=' integerValue=INT ';';

Or just a general purpose rule like:

type=('integer'|'string') name=ID '=' integerValue=INT|stringValue=STRING ';';

With stricter type-checking during validation? The first seems like it may be easy but overkill, while the other is more general-purpose but potentially unwieldy to work with. I'm curious if there is a particular method that works best?
It might be worthwhile mentioning that I have two different types of variable declaration, as global variables can only be a constant or already declared value (I skipped the referencing in the example), while a local variable can assign a calculation, function call etc. of the correct type.

Report message to a moderator

Re: Extending grammars and runtime environments [message #661737 is a reply to message #661707]

Sun, 27 March 2011 00:09

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 3/26/11 3:34 PM, Haravikk wrote:
> Henrik Lindberg wrote on Sat, 26 March 2011 09:04
>> What should happen when the user saves the text file? Do you need to
>> separate the extras from the basic, or do you want all the source text
>> in one file (i.e. in your slightly improved dialect of the language)?
>
> Say my basic files have an extension of .msl, and my extended syntax an
> extension of .msle. Both file-types should be editable using the
> appropriate grammar, but .msle files will be translated into a .msl
> build product for use outside the editor.
>
ok, what I guessed.

> Henrik Lindberg wrote on Sat, 26 March 2011 09:04
>> Guessing that you want both - I would:
>> - have two separate models a and b (external to the grammars)
>> - have two separate grammar projects a.dsl, and b.dsl, where the b.dsl
>> overrides and extends the rules of the a.dsl.
>> - write validators for the a.dsl/model is such a way that they are
>> sensitive to the actual language '.a' or '.b' (you don't want the
>> stricter rules for '.a' to kick in when editing '.b'.
>
> I'm not sure I quite follow the project structure you're proposing, are
> there examples of anything like this anywhere? Only tutorials and
> descriptions I seem to have found to follow are all for a single model,
> my end aim is to produce a single Eclipse plugin that handles my .msl,
> .msle, and any other types such as .mslr for a runtime perhaps, .mslm
> for modules etc.
>
When you start working on your grammar, it is recommended to start very
small and add feature by feature to the language - if adding everything
at once, you risk getting tons of "mysterious errors" and it is hard to
trace them back to the real cause. As you build things up, you probably
have just one project the "myorg.msl" project and you generate the msl
ecore model from the grammar (because it is more conventient when you
are working on the grammar). You later want to switch to an imported
model (you can copy the generated model as a starting point). In one of
my projects for the Puppet Manfifest Language ('.pp'), I have a
org.myorg.pp which is the ecore model and its implementation, I then
have a myorg.pp.dsl which contains the xtext grammar. In addition I have
broken out the mwe2 workflow to a myorg.pp.dsl.generate as I don't want
the pp.dsl bundle to have mwe2 (and additional) dependencies since these
are not needed for my use cases.

(The .pp stuff is in the github respository "cloudsmith/geppetto")

Now, when introducing the extended .msle, you could keep it in the .msl
bundle (same or different model/package), or create a new bundle where
you create the new msle model (it imports and extends the msl model).

You then also create a org.myorg.msle.dsl project with a grammar for
msle (importing the grammar from the msl.dsl project). As both of these
grammar use the same ecore model(s), there are no issues with two
separate generators generating the same model files.

The Eclipse b3 project uses a similar thing - it has two ecore models in
separate projects, but only one dsl project. (I learned a bit after b3,
and like the way things are named in my .pp project).

I hope that makes the structure I proposed clearer.

> Henrik Lindberg wrote on Sat, 26 March 2011 09:04
>> There are two ways of handling the linking, either as you did via
>> links (which become non containment references in the model), and then
>> you need to present the model where all the predefined elements are
>> available to the runtime, or via validation of contained references
>> (or contained features depending on type). Which to use depends on how
>> you will be working with the models.
>
> Well, part of my aim with my .msle type is to add modular construction
> of .msl files which don't themselves support imports. So I'll be
> providing the users with the ability to create modules (possibly with
> their own extension) with functions or events that can be joined
> together into a .msl file end product. I'm hoping I can handle both
> modules inclusion and runtime inclusion in a similar way, to avoid
> having to do too much duplicate work.
>
> So a .msle file will be identical to a .msl file, except with the
> ability to declare directives and import bits of code, which will then
> be built into a plain .msl file by adding the imported code wherever
> appropriate, following directives to inline functions etc.
>
Sounds reasonable, and the proposed structure would handle that. There
is a builder framework in Xtext, so having the .msl be generated from
the .msle does not seem very difficult.

> Henrik Lindberg wrote on Sat, 26 March 2011 09:04
>> You can check this during validation, or control it via linking and
>> scoping (i.e. by controlling what the linker sees in a particular scope).
>
> So the definition of EventHandler is correct in this case? Was mostly
> just hoping there might be an easy way to reduce the redundant
> parameters structure since it should match the corresponding Event that
> has been referenced, sounds simple enough, I've not gotten too far with
> the validator stuff yet, lots of fiddly grammar rules still to get right =)
>
I would say so yes. Using references / links, and controlling the
details with scoping is a good way to go.

> Somewhat related, but I'm wondering if there's a best-practise for
> defining type-safe grammar rules. For example, say I support variables
> of type integer, and string, should I define a rule for each type of
> variable like:
> Variable: StringVariable | IntegerVariable;
> StringVariable: type='string' name=ID '=' stringValue=STRING ';';
> IntegerVariable: type='integer' name=ID '=' integerValue=INT ';';
> Or just a general purpose rule like:
> type=('integer'|'string') name=ID '='
> integerValue=INT|stringValue=STRING ';';
> With stricter type-checking during validation? The first seems like it
> may be easy but overkill, while the other is more general-purpose but
> potentially unwieldy to work with. I'm curious if there is a particular
> method that works best?

I tend to use something like

Literal returns Expression: LiteralString | LiteralInt | LiteralFloat |
.... ;

Where a Literal has a value, and the value is of a specific data type.

LiteralString : value = STRING ;

STRING naturally has a data converted that handles conversion from text
to/ from a Java string (this is built into the default Xtext setup).

If you want INT, HEC, OCTAL, DATE, etc. you would create these ecore
datatypes and add the converters.

For variables, you typically have a Variable rule with a reference to
the type, and an expression as value. (And if you want to support type
inference, you make the type optional, and compute the type from the
expression - see below).

> It might be worthwhile mentioning that I have two different types of
> variable declaration, as global variables can only be a constant or
> already declared value (I skipped the referencing in the example), while
> a local variable can assign a calculation, function call etc. of the
> correct type.

This sounds like you need to support expressions, and have a type system
(if you want static type safety). There is a project that adds a type
system to Xtext. I wrote one for Eclipse b3 - it is not very difficult
to do if everything is declared and you don't need too much type
inference. (It gets difficult when you start dealing with generics and
type inference).

In my case, I started with a simple polymorphic dispatcher that
performed "type evaluation" (i.e. instead of adding two values - say '2
+ 3' it computed the type (INT + INT = INT). I then used this type
evaluator during validation to check that types are ok. And I used it
for simple type inference (before I implemented a more versatile
inference solver) for expressions like "var x = 1 + max(1,2)" where it
is possible to make a simple inference that var x is of type INT since
everything is statically known/declared.

If you have a complex expression language, or want/need access to java
in your language, then the Xbase language is probably something you want
to take a close look at. If not, maybe you can find some useful things
in Eclipse b3 (which also integrates with java btw, but also defines
"system functions" written in java). Eclipse b3 is a project at Eclipse
(under modelling).

For the difference between global variables and local, it is probably
easier to treat them the same way and validate that variables in a
global context "are constants". Do you support constant expression (e.g.
"this is a" + "concatenation", 1 << 8 | 2, etc.). If so, you can write a
constant polymorphic dispatch evaluator that performs evaluation (and
check that expressions are indeed constant as a side effect) - again, I
wrote one for Eclipse b3 that you could take a look at for ideas.

Hope that helps you.
Regards
- henrik

Report message to a moderator

Previous Topic:	Failing to specifying a simple key value grammar
Next Topic:	[xtext 2] How to get EObject from an specific line of an XtextEditor?

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Fri Apr 26 15:02:34 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter