From my experience of using Xtext, I got some tips. I want to share it. I feel that they are reasonable.
These tips aim to help you to write the grammar of a language in Xtext. If you have had the meta-model of the language, these tips are useless. If you,
* have textual syntax of the language, and
* do not have the meta-model of the language, and
* want to generate the meta-model from the grammar in xtext
these tips may help you to generate a "good" meta-model.
Here "good" means the meta-model can discribe the concept of the language clearly and precisely. And, it would be convenient to access the AST generated by the parser.
The principles are
* each EClass has more than one attribute
* each alternative creates distinguish EClass
The rules are
* absorption law
* division law
1 absorption law
This law is used to implement the principle 1.
If the parser rule meets the following two conditions, this rule can be applied
* EClass generated by this rule has only one attribute
* In the calling rule, it is not an alternative
Take the grammar below as an example
Entity1:
'entity1' name = ID contents = Contents
;
Entity2:
'entity2' name = ID contents = Contents
;
Contents: {Contents}
elements += Greeting *
;
The parser rule "Contents" can generate a EClass "Contents" who has only one attribute "elements". And, the "rule call" of this rule is not an alternative. So according to the "absorption law", the grammar can be rewrite in the following way
Entity1:
'entity1' name = ID elements += Greeting *
;
Entity2:
'entity2' name = ID elements += Greeting *
;
The calling rule "Entity1" and "Entity2" absorpt the parser rule "Contents". Actually, the EClass "Contents" is a redundant type in the meta-model. Without this type, the AST generated by the parser will be more concise.
But, if the rule is an alternative in the calling rule, the "absorption law" can not be applied. For example,
AbstractTerminal returns AbstractElement:
Keyword |
RuleCall |
ParenthesizedElement |
// Keyword and RuleCall are used inside of Assignable terminal
// As we do not want to be able to write name==>ID
// the rules for Keyword and RuleCall have been copied
PredicatedKeyword |
PredicatedRuleCall |
// We have to make this one explicit since the ParenthesizedElement does not
// create an object but we have to set the predicated flag
// TODO: As soon as we have an own element for parenthesized elements with
// cardinality, we should refactor this part of the grammar
PredicatedGroup
;
Keyword :
value=STRING
;
The parser rule "Keyword" will generates an EClass "Keyword" with only one attribute "value". But, the calling rule "AbstractTerminal" can not absorpt the rule "Keyword". Because the calling "Keyword" is an alternaitve.
2. division law
This law is used to implement the princile 2. If each alternative in a parser rule creates the same EClass, the alternatives should be divided.
Take the following grammar as an example.
AbstractTerminal returns AbstractElement:
value=STRING
| rule=[AbstractRule]
| predicated?='=>' value=STRING
| predicated?='=>' rule=[AbstractRule]
;
The parser rule "AbstractTerminal" will generates an EClass "AbstractElement" who has three attributes. This EClass is difficult to understand for the readers of this meta-model because of the three attributes appear together. We should try to make each alternative returns different EClass. We should divide the EClass "AbstractElement". There are two ways to achive this.
One is using a new parser rule for each alternative.
Applying this way, we get the following grammar.
AbstractTerminal returns AbstractElement:
Keyword |
RuleCall |
PredicatedKeyword |
PredicatedRuleCall
;
Keyword :
value=STRING
;
RuleCall :
rule=[AbstractRule]
;
PredicatedKeyword returns Keyword:
predicated?='=>' value=STRING
;
PredicatedRuleCall returns RuleCall:
predicated?='=>' rule=[AbstractRule]
;
Or we can add Action for each alternative. This is illustrated below.
AbstractTerminal returns AbstractElement:
{Keyword}value=STRING
| {RuleCall}rule=[AbstractRule]
| {PredicatedKeyword}predicated?='=>' value=STRING
| {PredicatedRuleCall}predicated?='=>' rule=[AbstractRule]
;
Apllying the two laws will make your meta-model generated from the grammar more concise. That is my experience.