Eclipse Community Forums: TMF (Xtext) » Proposal - Adding internalization support to Xtext

Help

Home

Home » Modeling » TMF (Xtext) » Proposal - Adding internalization support to Xtext

Show: Today's Messages :: Show Polls :: Message Navigator

Proposal - Adding internalization support to Xtext [message #984647]

Wed, 14 November 2012 21:47

Boris Brodski

Messages: 112
Registered: July 2009

Senior Member

Hi,

I would like to discuss a new feature request before officially submitting it.
I am excited about you comments.

Regards,
Boris

Motivation
==========

During an improvement of the Jnario test framework http://jnario.org/ we came along a requirement, that (looks like) can't be implemented without extending Xtext. The requirement is to internationalize keywords, that are parts of the complex terminal rules. An example follows, that illustrates the problem:

--- Feature.xtext ---
terminal SCENARIO_TEXT: "Scenario:" MNL;
terminal fragment MNL: !('\r'|'\n')* NL;
---------------------

Source: https://github.com/bmwcarit/Jnario/blob/master/plugins/org.jnario.feature/src/org/jnario/feature/Feature.xtext

A language fragment to parse:

---------------------
Scenario: Here is a description of our scenario
---------------------

The Goal: Add support for a large number of different spoken languages. Translated to German this could be:

---------------------
// language: de

...

Szenario: Hier wir unser Szenario beschrieben
---------------------

More details on this special problem can be found here: https://github.com/bmwcarit/Jnario/issues/23

If we aren't mistaken, this is a violation of the Xtext motto:

"We aim to make simple things simple and complex things possible" (Alan Kay)

Proposal
========

The Xtext framework should learn the concept of the internalization. Here is how it could be done.

1. Extend the Xtext.xtext
- Add "languages" keyword to define a list of the IETF language tags. For example: "language en, de"
- Add a new syntax construct to define keyword placeholders %{name}. For example:

--- MyGrammer.xtext ---
language en, de

Greeting:
"%{hello}" name=ID '!';
-----------------------

Note the double quotes around %{}. This allows us to define complex keywords, like "%{package}-%{private}".

2. Support multiple property files with the translations:

--- MyGrammer.properties ---
hello=Hello
----------------------------

--- MyGrammer-de.properties ---
hello=Moin
-------------------------------

The property file without IETF (in our examplet "MyGrammer.property") is the default property file. It should contain
translations for all placeholders. Missing translations could be replaced with the placeholder name and
a warning could be issued.

3. Extend .g generation step as following:
- For all defined non-default languages
- Read corresponding .property file
- Replace all the %{} placeholders with the translated values for the current language, falls back to the default translation if necessary.
- Generate .g grammar and run ANTLR generator
- Rename the generated lexer (add language IETF code to the name or package)

4. Add a new interface to determine the locale for each resource

public interface ILocaleProvider {
String determineLocale(Resource resource, ...);
}

5. Introduce LexerDelegator (extends Lexer) that determines the language of the resource (calls ILocaleProvider.determines()) and then delegates all
consequent calls to the localized lexer generated and renamed in the step 3.

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #984849 is a reply to message #984647]

Thu, 15 November 2012 01:31

Henrik Lindberg

Messages: 2509
Registered: July 2009

Senior Member

On 2012-14-11 22:47, Boris Brodski wrote:
> Hi,
>
>
> I would like to discuss a new feature request before officially
> submitting it.
> I am excited about you comments.
>
Not so sure about that... ;) the only comment I first I can think of is:
"The road to hell is paved with good intentions".

Long time ago, I was exposed to a rule engine with NL traits like these
- with total chaos as the result (copy pasting rules in different
languages, rewrites from one human language to another, merge hell,
sysadmins reading and getting error messages they did not understand, etc).

That got resolved by a decision to always use English everywhere...

So - pick which hell you are heading for :)

Nevertheless...

You can probably do this without any changes to Xtext itself, by
generating your own lexers (as you propose, simply name the keywords
with some prefix, and copy the generated lexer.g while merging in the
actual keyword text).

You can already use your own external lexer in the mwe workflow, so some
variation on this theme is probably doable.

Note that there is more than one lexer being generated (for highlighting).

In cloudsmith / geppetto @ github, I do use an external lexer, and I
have modified the way it is instantiated (as I needed a wrapper for the
lexer (for other purposes than yours). The same approach would work for
what you are trying to achieve I think.

A different approach is to include all of the keywords at the same time
(unless different languages makes it ambiguous), and then modify code
completion to filter out keywords that are not in the intended target
human language, validate that only one language is used throughout a
file. (I.e. the lexer/grammar is polyglot in itself).

You could also try to reduce the amount of keywords in the actual
language to a minimum, and then do a second level
interpretation/parsing/transformation. (But this depends on the language
naturally).

- henrik

>
> Regards,
> Boris
>
>
>
>
>
> Motivation
> ==========
>
>
> During an improvement of the Jnario test framework http://jnario.org/ we
> came along a requirement, that (looks like) can't be implemented without
> extending Xtext. The requirement is to internationalize keywords, that
> are parts of the complex terminal rules. An example follows, that
> illustrates the problem:
>
>
> --- Feature.xtext ---
> terminal SCENARIO_TEXT: "Scenario:" MNL;
> terminal fragment MNL: !('\r'|'\n')* NL;
> ---------------------
>
> Source:
> https://github.com/bmwcarit/Jnario/blob/master/plugins/org.jnario.feature/src/org/jnario/feature/Feature.xtext
>
>
> A language fragment to parse:
>
> ---------------------
> Scenario: Here is a description of our scenario
> ---------------------
>
>
> The Goal: Add support for a large number of different spoken languages.
> Translated to German this could be:
>
> ---------------------
> // language: de
>
> ..
>
> Szenario: Hier wir unser Szenario beschrieben
> ---------------------
>
> More details on this special problem can be found here:
> https://github.com/bmwcarit/Jnario/issues/23
>
>
> If we aren't mistaken, this is a violation of the Xtext motto:
>
> "We aim to make simple things simple and complex things possible"
> (Alan Kay)
>
>
> Proposal
> ========
>
> The Xtext framework should learn the concept of the internalization.
> Here is how it could be done.
>
> 1. Extend the Xtext.xtext
> - Add "languages" keyword to define a list of the IETF language tags.
> For example: "language en, de"
> - Add a new syntax construct to define keyword placeholders %{name}.
> For example:
> --- MyGrammer.xtext ---
> language en, de
>
> Greeting:
> "%{hello}" name=ID '!';
> -----------------------
>
> Note the double quotes around %{}. This allows us to define complex
> keywords, like "%{package}-%{private}".
>
> 2. Support multiple property files with the translations:
>
> --- MyGrammer.properties ---
> hello=Hello
> ----------------------------
>
> --- MyGrammer-de.properties ---
> hello=Moin
> -------------------------------
>
> The property file without IETF (in our examplet
> "MyGrammer.property") is the default property file. It should contain
> translations for all placeholders. Missing translations could be
> replaced with the placeholder name and
> a warning could be issued.
>
> 3. Extend .g generation step as following:
> - For all defined non-default languages
> - Read corresponding .property file
> - Replace all the %{} placeholders with the translated values for
> the current language, falls back to the default translation if necessary.
> - Generate .g grammar and run ANTLR generator
> - Rename the generated lexer (add language IETF code to the name or
> package)
>
> 4. Add a new interface to determine the locale for each resource
>
> public interface ILocaleProvider {
> String determineLocale(Resource resource, ...);
> }
>
> 5. Introduce LexerDelegator (extends Lexer) that determines the language
> of the resource (calls ILocaleProvider.determines()) and then delegates all
> consequent calls to the localized lexer generated and renamed in the
> step 3.
>
>
>

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985293 is a reply to message #984849]

Thu, 15 November 2012 09:30

Boris Brodski

Messages: 112
Registered: July 2009

Senior Member

Hello Henrik,

thank you for the hints. I will look into it.

> Long time ago, I was exposed to a rule engine with NL traits like these
> - with total chaos as the result (copy pasting rules in different
> languages, rewrites from one human language to another, merge hell,
> sysadmins reading and getting error messages they did not understand, etc).
>
> That got resolved by a decision to always use English everywhere...

I know, what you mean. Normally, I'm the first one, who say "english only".
Localized programmed languages are evil. But many DSLs should be localized in order to get the necessary acceptance by the customers (that may not speak English at all). There are always domain specific terms, words or phrases, that must be reflected within a good DSL.

So if you agree with me, that DSL may contain non-English keywords, then you may also agree with me, that (in case of success) the time will come, when you will need to internationalize your DSL.

In my special case, this is necessary, since the entire idea of the Cucumber (http://cukes.info/) is to implement BRDSL (Business Readable Domain Specific Language). This means, that the keywords should be fully integrated into the spoken language and though must be translated.

Please, just take a look of this to get an idea: http://jnario.org/org/jnario/feature/documentation/IntroducingJnarioFeaturesSpec.html

Regards,
Boris

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985546 is a reply to message #985293]

Thu, 15 November 2012 13:38

Aaron Digulla

Messages: 258
Registered: July 2009
Location: Switzerland

Senior Member

Hi Boris,

I understand exactly where you're heading and why. But users sometimes ask for features which cause more pain than they can possibly imagine. That's why OO languages allow you to use IDs in any language but the keywords are English only. Examples:

class Haus, int zähler, etc.

This means the code can be read in any country as long as you accept that you have to learn a "slang" to understand it.

So you should aim for a DSL which uses fixed keywords but where these can be hidden from the "end user" by defining customized "types" in the native language.

Problems you will face with localized keywords:

* Some keywords will translate into several words in a foreign language.
* Some languages contain illegal characters that aren't allowed in keywords (like the apostrophe in French or the upside-down ? in Spain).
* It makes the result unusable in any other country in the world - the person writing the code usually doesn't care but all other people won't like it one bit.
* It only works if you never consume any code from anyone else because if you do, you will suddenly have the language mix that you wanted to avoid in the first place. So even if you could use native language keywords, you will eventually have to write a DSL which contains words from a foreign language.
* It's something that looks reasonable. Eventually, you will have written enough code to understand that it was stupid to begin with but at that time, you will have so much existing code that you can't change it anymore.
* Some people think you can have an automatic translation, i.e. where "When" is replaced with "Wenn" when you display an English DSL in German. That also doesn't work because other languages want to put the keyword in a different position in the sentence. So even though you could replace the keyword, the resulting sentence is only valid in English.

Not convinced? Have your DSL translated into Chinese, Japanese, French, German and Portuguese and let some native speakers have a look at the result and ask them if they still think it's a good idea.

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985589 is a reply to message #985546]

Thu, 15 November 2012 14:18

Sebastian Benz

Messages: 6
Registered: March 2011

Junior Member

Aaron,

you are right with the suggestion that translating a DSL usually doesn't make sense. However, in this context it actually makes sense and is already used successfully (see www github.com/cucumber/cucumber/tree/master/examples/i18n for example translations). The reason is that the DSL combines free text with fixed keywords, which makes having language specific keywords even more important.

Cheers,

Sebastian

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985626 is a reply to message #985589]

Thu, 15 November 2012 14:52

Andreas Brieg

Messages: 48
Registered: November 2012

Member

I think that localizing a dsl is a good idea. But I don't think that it's a job for xtext itself. The editor is probably a much better place to do localization. You just need to transform from the grammar to a localized model of the grammar forth and back when viewing/modifying your dsl sources. Then you could also handle the problem when copying parts of the sources. The editor will copy the original grammar sources to the clipboard and will receive original grammar sources when pasting.

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985665 is a reply to message #985589]

Thu, 15 November 2012 16:12

Aaron Digulla

Messages: 258
Registered: July 2009
Location: Switzerland

Senior Member

Sebastian Benz wrote on Thu, 15 November 2012 15:18

However, in this context it actually makes sense and is already used successfully

And the result is grammatically correct in all languages? Is "When" always the first word in all languages in the world?

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985733 is a reply to message #984647]

Thu, 15 November 2012 21:05

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

Sorry, but to be honest this sounds like a feature I would never
prioritize high enough to actually implement it.

I18n to that degree may exist in commercial tools, manufactured by big
companies with a lot of native speakers of different languages. IMHO the
cost is in no sensible relation to the effort.

Does a crippled sentence with wrong grammar, order of words, etc. in
your mother tongue really feel more natural or better than a mix of
English and your language built according to some apparently formal rules?

Am 14.11.12 22:47, schrieb Boris Brodski:
> Hi,
>
>
> I would like to discuss a new feature request before officially
> submitting it.
> I am excited about you comments.
>
>
> Regards,
> Boris
>
>
>
>
>
> Motivation
> ==========
>
>
> During an improvement of the Jnario test framework http://jnario.org/ we
> came along a requirement, that (looks like) can't be implemented without
> extending Xtext. The requirement is to internationalize keywords, that
> are parts of the complex terminal rules. An example follows, that
> illustrates the problem:
>
>
> --- Feature.xtext ---
> terminal SCENARIO_TEXT: "Scenario:" MNL;
> terminal fragment MNL: !('\r'|'\n')* NL;
> ---------------------
>
> Source:
> https://github.com/bmwcarit/Jnario/blob/master/plugins/org.jnario.feature/src/org/jnario/feature/Feature.xtext
>
>
> A language fragment to parse:
>
> ---------------------
> Scenario: Here is a description of our scenario
> ---------------------
>
>
> The Goal: Add support for a large number of different spoken languages.
> Translated to German this could be:
>
> ---------------------
> // language: de
>
> ..
>
> Szenario: Hier wir unser Szenario beschrieben
> ---------------------
>
> More details on this special problem can be found here:
> https://github.com/bmwcarit/Jnario/issues/23
>
>
> If we aren't mistaken, this is a violation of the Xtext motto:
>
> "We aim to make simple things simple and complex things possible" (Alan
> Kay)
>
>
> Proposal
> ========
>
> The Xtext framework should learn the concept of the internalization.
> Here is how it could be done.
>
> 1. Extend the Xtext.xtext
> - Add "languages" keyword to define a list of the IETF language tags.
> For example: "language en, de"
> - Add a new syntax construct to define keyword placeholders %{name}. For
> example:
> --- MyGrammer.xtext ---
> language en, de
>
> Greeting:
> "%{hello}" name=ID '!';
> -----------------------
>
> Note the double quotes around %{}. This allows us to define complex
> keywords, like "%{package}-%{private}".
>
> 2. Support multiple property files with the translations:
>
> --- MyGrammer.properties ---
> hello=Hello
> ----------------------------
>
> --- MyGrammer-de.properties ---
> hello=Moin
> -------------------------------
>
> The property file without IETF (in our examplet "MyGrammer.property") is
> the default property file. It should contain
> translations for all placeholders. Missing translations could be
> replaced with the placeholder name and
> a warning could be issued.
>
> 3. Extend .g generation step as following:
> - For all defined non-default languages
> - Read corresponding .property file
> - Replace all the %{} placeholders with the translated values for the
> current language, falls back to the default translation if necessary.
> - Generate .g grammar and run ANTLR generator
> - Rename the generated lexer (add language IETF code to the name or
> package)
>
> 4. Add a new interface to determine the locale for each resource
>
> public interface ILocaleProvider {
> String determineLocale(Resource resource, ...);
> }
>
> 5. Introduce LexerDelegator (extends Lexer) that determines the language
> of the resource (calls ILocaleProvider.determines()) and then delegates all
> consequent calls to the localized lexer generated and renamed in the
> step 3.
>
>
>

--
Need professional support for Eclipse Modeling?
Go visit: http://xtext.itemis.com

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #985856 is a reply to message #985733]

Fri, 16 November 2012 12:47

Boris Brodski

Messages: 112
Registered: July 2009

Senior Member

> Sorry, but to be honest this sounds like a feature I would never
> prioritize high enough to actually implement it.

I planned to implement this feature myself, ones it's discussed and approved.
(I definitely will ask a couple of technical questions, though)

> I18n to that degree may exist in commercial tools, manufactured by big
> companies with a lot of native speakers of different languages. IMHO the
> cost is in no sensible relation to the effort.

This is a very advanced feature indeed. And even if very little DSLs
make use of it, it's still a good marketing argument. ("... and complex things possible")

> Does a crippled sentence with wrong grammar, order of words, etc. in
> your mother tongue really feel more natural or better than a mix of
> English and your language built according to some apparently formal rules?

There are definitely many DSLs out there, that can't be internationalized at all.
I think, this depends on the domain and the design of the language. There
will be also often the case, that a specified DSL will be localized only to couple of languages
of a single language group.

Also for simple languages, like LOGO language, the internationalization should be simple.
Especially LOGO-language was localized many times to teach kids. In case of Jnario
or cucumber this works also very nice. (https://github.com/cucumber/gherkin/blob/master/lib/gherkin/i18n.json)

If we design a real DSL (to be read by customers), it's nice just to have such option.
In all such DSLs I designed so far I used non-English keywords. Since those
DSLs were used within a single land, this wasn't a problem. But if such DSL
get popular within an international company, this will become a limitation.

In order to address the problem with the order of the words in the sentence, we could also support something like this:

AssertExpr:
  %{=en expr1=Expression "should" "be" expr2=Expression        }
  %{=de expr1=Expression "soll"        expr2=Expression "sein" }
;

// language: en
1 + 2 should be 3

// language: de
1 + 2 soll 3 sein

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #986157 is a reply to message #985856]

Mon, 19 November 2012 08:52

Ingo Meyer

Messages: 162
Registered: July 2009

Senior Member

Hi,

I would think about a different approach, which we decided is the best if you need I18N:
Use different grammars for each language connected to the same metamodel.

As already mentioned, different languages will also have different "ordering" of words, a "good reading" phrase may be terrably wrong in another language, they have different chars, etc.

1. A totally different grammar per language will help you with that and can be improved by a native speaker independently from the others
2. you will not introduce too many keywords in the one big grammar, as this is never a good idea (you have to escape too many things via "^"), or hit the 64k limit!
3. Xtext will give you an easy environment for doing multiple grammars without too much additional work.
4. You can "translate" into another human-language by just opening with the corresponding editor or changing file extension

What about that?

~Ingo

Report message to a moderator

Re: Proposal - Adding internalization support to Xtext [message #988907 is a reply to message #986157]

Mon, 03 December 2012 16:31

Aaron Digulla

Messages: 258
Registered: July 2009
Location: Switzerland

Senior Member

@Ingo: I like this approach.

We use specialized models that are filled from Xtext/EMF models in many places because it create a layer of independence between the business code (doesn't need to be aware of Xtext/EMF limitations/features) and the DSL implementation.

The basic idea is that you create a custom model which contains all the information that you need. This can be EMF based or plain POJOs.

Then you write a converter which builds the custom model from an Xtext model. Your DSL has a "DslSwitch" class that helps greatly in this process.

It might even be possible to use a custom EMF model and hook that to your grammar (so all grammars will use the same underlying model and only the position and text of the keywords will change).

Regards,

A. Digulla

Report message to a moderator

Previous Topic:	What's wrong with this cross-reference
Next Topic:	Keeping old Xtext versions in maven repository

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Fri Apr 19 20:14:56 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter