Eclipse Community Forums: TMF (Xtext) » Umlauts and UTF-8 BOM

Help

Home

Home » Modeling » TMF (Xtext) » Umlauts and UTF-8 BOM

Show: Today's Messages :: Show Polls :: Message Navigator

Umlauts and UTF-8 BOM [message #1690231]

Wed, 25 March 2015 20:53

Hendrik Motza

Messages: 7
Registered: October 2014

Junior Member

Hi,

I have to write a grammar for an existing DSL which uses UTF-8 files and can contain any umlauts (also outside of quoted strings).

1. There are too much umlauts in the different languages. Is there no way to allow any letter similar to java regular expressions with \\w?

2. When a dsl file is opened which begins with the UTF-8 BOM, this results in a parser exception. Is there a way to ignore the BOM preambel?

Thx in advance!
DataWorm

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1690250 is a reply to message #1690231]

Thu, 26 March 2015 06:21

Christian Dietrich

Messages: 14699
Registered: July 2009

Senior Member

for (1) no the only way is to define ranges '\uFIRST'..'\uSECOND'

for (2) you may comment on https://bugs.eclipse.org/bugs/show_bug.cgi?id=390308

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1690251 is a reply to message #1690250]

Thu, 26 March 2015 06:24

Christian Dietrich

Messages: 14699
Registered: July 2009

Senior Member

P.S: Maybe you can additionally add U+FEFF to the grammar

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1690278 is a reply to message #1690250]

Thu, 26 March 2015 09:53

Hendrik Motza

Messages: 7
Registered: October 2014

Junior Member

'\\u...' is recognized as string and not as character. But the idea of a range was simple and helpful. I solved it that way:
terminal fragment LETTER: ('a'..'z' | 'A'..'Z' | 'À'..'ÿ');

I have seen that bug report, also the day it was opened and that the state is still marked as NEW. Is Xtext still under development or why does they ignore such old unhandled bugs?

I thought someone might have found another working solution for this like christian supposed. I will give it a further try!

[Updated on: Thu, 26 March 2015 10:48]

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738620 is a reply to message #1690278]

Thu, 21 July 2016 13:26

Nils B.

Messages: 10
Registered: July 2016

Junior Member

i also need to ignore the bom. is there now a way to do this?

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738657 is a reply to message #1738620]

Thu, 21 July 2016 19:19

Hendrik Motza

Messages: 7
Registered: October 2014

Junior Member

If there is one I haven't found it yet! Sad

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738892 is a reply to message #1738657]

Mon, 25 July 2016 18:37

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

What do you mean by "ignoring the BOM"?

The BOM is usually handled by Eclipse-classes, such as org.eclipse.ui.editors.text.FileDocumentProvider.setDocumentContent(IDocument, IEditorInput, String).

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738894 is a reply to message #1738892]

Mon, 25 July 2016 18:57

Hendrik Motza

Messages: 7
Registered: October 2014

Junior Member

When using xtext on a utf8 file with bom, the bom is seen as content of my xtext grammar. Therefore when i try to load such a grammar file starting with a bom the editor tells me the dsl starts with invalid content.

To avoid such error messages we would like to ignore the bom if it exists at the start of the document. I tried to add the bom to my grammar (and then simply ignore this part of my grammar) but so far I did not succeed for some reasons to catch that charsequence with a grammar...

Thx for your hint regarding the corresponding eclipse class but for now I have no clue how to modify/influence it because all these calls of eclipse classes are done by the xtext sdk... :-/

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738932 is a reply to message #1738894]

Tue, 26 July 2016 08:26

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

You must have customized something else, as by default you never see the BOM inside the document, i.e. you never have to and you never can deal with it in the grammar.

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1738933 is a reply to message #1738932]

Tue, 26 July 2016 08:27

Jan Koehnlein

Messages: 760
Registered: July 2009
Location: Hamburg

Senior Member

The hint could give you a good start for debugging. Usually Xtext classes inherit from Eclipse classes and you have to find out which ones and where they are created and then hook in your changes by dependency injection.

---
Get professional support from the Xtext committers at www.typefox.io

Report message to a moderator

Re: Umlauts and UTF-8 BOM [message #1739031 is a reply to message #1738932]

Wed, 27 July 2016 05:32

Ed Merks

Messages: 33187
Registered: July 2009

Senior Member

I didn't test it, but perhaps the workspace doesn't think/know the
encoding is UTF-8 and isn't expected the BOM at the start.

On 26.07.2016 10:26, Jan Koehnlein wrote:
> You must have customized something else, as by default you never see
> the BOM inside the document, i.e. you never have to and you never can
> deal with it in the grammar.

Ed Merks
Professional Support: https://www.macromodeling.com/

Report message to a moderator

Previous Topic:	IAE Exception when no type name.
Next Topic:	[SOLVED] Content Assist Invalid completeXMemberFeatureCall_Feature

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Sat Jul 27 05:06:27 GMT 2024

.:: Contact :: Home ::.

Breadcrumbs

Sign up to our Newsletter