Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » TMF (Xtext) » Codepage problem(Code generated the first time is stored using the wrong codepage (also in xtend))
Codepage problem [message #1791363] Thu, 28 June 2018 11:58 Go to next message
Thomas Kohler is currently offline Thomas KohlerFriend
Messages: 3
Registered: June 2018
Junior Member
Hello,

We have two (codepage?) problems when artefacts like java-code are generated the first time out of a xtext / xbase based dsl and also with native xtend.

First Problem (in our DSL):

To allow our customers to write DSL code in their native language we use UTF-8 encoding in all related files.

In Eclipse we changed the Content Type of all *.java, *.xtend, *.ext, *.xpt and also our DSL *.4gl files to UTF-8.
In our xtext grammar we override the following lexical rule:

[...]
@Override 
terminal ID:
	'^'? ID_START_CHAR ID_CHAR*;

terminal fragment ID_START_CHAR
	: 'A'..'Z'
	| 'a'..'z'
	| '_'
	| '\u00C0'..'\u00D6'
	| '\u00D8'..'\u00F6'
	| '\u00F8'..'\u02FF'
	| '\u0370'..'\u037D'
	| '\u037F'..'\u1FFF'
	| '\u200C'..'\u200D'
	| '\u2070'..'\u218F'
	| '\u2C00'..'\u2FEF'
	| '\u3001'..'\uD7FF'
	| '\uF900'..'\uFDCF'
	| '\uFDF0'..'\uFFFD'
	// ignores | ['\u10000-'\uEFFFF]
	;

  terminal fragment ID_CHAR
	: ID_START_CHAR
	| '0'..'9'
	| '\u00B7'
	| '\u0300'..'\u036F'
	| '\u203F'..'\u2040'
	;
[...]


This allows the usage of all characters which are also allowed in (UTF-8) Java as identifiers in the DSL.

In the Model Workflow we defined the following parts to ensure UTF-8:

[...]
Workflow {
	component = XtextGenerator {
		configuration = {
            [...]
			code = {
				encoding = "utf-8"
				[...]
			}
		}
		language = StandardLanguage {
            [...]
			fileExtensions = "4gl"
            [...]
		}
	}
}


When we write a new DSL file resource, it will be stored as UTF-8. The generated java source code will also be encoded in UTF-8 as mentioned in the mwe2 file.

Our problem occures exactly when a generated artifact is generated and stored the first time (no file exists before).
In this case the special characters will be in the wrong codebase or replaced with Question Marks ('?') when they were 'to exotic'.

But if we make a zero-change in the dsl file afterwards (e.g. adding and removing a blank) which marks it dirty to store it again (using the eclipse editor generaded by xtext for our dsl) the generated artefact file will be correct(ed).
The same happens when we trigger "Project / Clean.." on the eclipse project containing the dsl file.

Example in DSL:

package example {
	BoRoot WithUmlautÄ {
		
	}
}


Initially generated Code:

public interface WithUmlaut� extends BoRoot {
}


Code generated when the File already existed before:

public interface WithUmlautÄ extends BoRoot {
}


In the first (wrong) example the Letter Ä was encoded by the single byte C4 (ISO 8859-1), in the second (working) version as double byte C3 84 (UTF-8)
Also deleting the generated java file and storing the dsl file again causes the error.

Second Problem (xtend):

We have a similar Problem using native xtend together with GIT. When we write a xtend class with code using e.g. the german sz (ß) like:

class Test {
	def test() {
		val straße = ""
	}
}


Storing the class the first time it works without problem. When we commit this xtend class to GIT and an other member pulls the code the first time, the java classes generated by xtend have the same codepage problem. Also here the zero change and store triggers the xtend generator - but now on an existing file - and the codepage problems disappear.
In this case the" project clean" usecase did not corrupt the generated java artefacts, but deleting the java artefact and store the xtend file again also causes the same error like in our DSL.


Any ideas what the problem could be?

Thanks,
Tom
Re: Codepage problem [message #1791572 is a reply to message #1791363] Mon, 02 July 2018 15:02 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
can you please provide a complete example?
i assume that at some place the encoding providers are not working correct.
did you debug WorkspaceEncodingProvider and org.eclipse.xtext.builder.EclipseResourceFileSystemAccess2.getEncoding(IFile)
to find out where the wrong encoding comes from


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Codepage problem [message #1791585 is a reply to message #1791572] Mon, 02 July 2018 16:11 Go to previous messageGo to next message
Thomas Kohler is currently offline Thomas KohlerFriend
Messages: 3
Registered: June 2018
Junior Member
Hello Christian, thank you for your reply.

To see the problem you just need to take the current Photon Eclipse IDE for Java and DSL Developers. Run it, make a new Workspace. Then go to Windows/Preferences/Content Types, open the "Text" node, select "Java Source File", Enter UTF-8 in the Field "Default Encoding", press Update. Do the same with "Xtend File".

Now create a simple Java class like Adresse.java:

package test;

public class Adresse {

	private String straße;
	
	public String getStraße() {
		return straße;
	}
	
	public void setStraße(String straße) {
		this.straße = straße;
	}
}


and a Xtend class like Test.xtend:

package test

class Test {

	def test() {
		val adresse = new Adresse
		adresse.straße = "test"
	}
}


Everything is OK now, both the java as the xtend source and the generated java code from xtend are UTF-8.

Now just clean the project (menu Project/Clean) and the encoding of the freshly generated java source from xtend is Cp1252 now (as the global default of the container e.g. Workspace). Because Java assumes UTF-8 (as set for *.java files before) it can't compile the java file. The error is also (correctly) propagated back to the xtend file.

If you make a zero change to the xtend source and save again, the generated file will be correctly encoded as UTF-8 again.

I think this demonstrates the main problem which also occures on own DSLs using the same generation process.

regards, Tom
Re: Codepage problem [message #1791588 is a reply to message #1791585] Mon, 02 July 2018 16:15 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
What are the encoding settings of your project and workspace

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Codepage problem [message #1791590 is a reply to message #1791588] Mon, 02 July 2018 16:25 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
i could not reproduce that.
the generate file has the encoding of the workspace/project
and the stuff is correctly escaped

package sample

class Sampke {
	
	def static void main(String[] args) {
		Util.äöüß
	}
	
}


package sample;

import sample.Util;

@SuppressWarnings("all")
public class Sampke {
  public static void main(final String[] args) {
    Util.\u00e4\u00f6\u00fc\u00df();
  }
}




or when using cp1252and not us ascii

package sample;

import org.eclipse.xtext.xbase.lib.InputOutput;
import sample.Util;

@SuppressWarnings("all")
public class Sampke {
  public static void main(final String[] args) {
    Util.äöüß();
    InputOutput.<Integer>println(Integer.valueOf(1));
  }
}



// in cp1252


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de

[Updated on: Mon, 02 July 2018 16:37]

Report message to a moderator

Re: Codepage problem [message #1791591 is a reply to message #1791590] Mon, 02 July 2018 16:30 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
=> i never get a utf-8 file for the generated java

Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Codepage problem [message #1791592 is a reply to message #1791591] Mon, 02 July 2018 16:41 Go to previous messageGo to next message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
problem is the content tyes.
please provide a bug against
github.com/eclipse/xtext-eclipse


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Re: Codepage problem [message #1791593 is a reply to message #1791592] Mon, 02 July 2018 16:48 Go to previous message
Christian Dietrich is currently offline Christian DietrichFriend
Messages: 14665
Registered: July 2009
Senior Member
workaround:

subclass and bind org.eclipse.xtext.builder.EclipseResourceFileSystemAccess2.getEncoding(IFile)

and use org.eclipse.core.runtime.Platform .getContentTypeManager().findContentTypeFor(file.getName()).getDefaultCharset()


Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
Previous Topic:Xtext: Cannot resolve reference
Next Topic:Formatting Multiple Consecutive Keywords
Goto Forum:
  


Current Time: Thu Apr 25 23:55:47 GMT 2024

Powered by FUDForum. Page generated in 0.03677 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top