Codepage problem [message #1791363] |
Thu, 28 June 2018 11:58 |
Thomas Kohler Messages: 3 Registered: June 2018 |
Junior Member |
|
|
Hello,
We have two (codepage?) problems when artefacts like java-code are generated the first time out of a xtext / xbase based dsl and also with native xtend.
First Problem (in our DSL):
To allow our customers to write DSL code in their native language we use UTF-8 encoding in all related files.
In Eclipse we changed the Content Type of all *.java, *.xtend, *.ext, *.xpt and also our DSL *.4gl files to UTF-8.
In our xtext grammar we override the following lexical rule:
[...]
@Override
terminal ID:
'^'? ID_START_CHAR ID_CHAR*;
terminal fragment ID_START_CHAR
: 'A'..'Z'
| 'a'..'z'
| '_'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
// ignores | ['\u10000-'\uEFFFF]
;
terminal fragment ID_CHAR
: ID_START_CHAR
| '0'..'9'
| '\u00B7'
| '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
[...]
This allows the usage of all characters which are also allowed in (UTF-8) Java as identifiers in the DSL.
In the Model Workflow we defined the following parts to ensure UTF-8:
[...]
Workflow {
component = XtextGenerator {
configuration = {
[...]
code = {
encoding = "utf-8"
[...]
}
}
language = StandardLanguage {
[...]
fileExtensions = "4gl"
[...]
}
}
}
When we write a new DSL file resource, it will be stored as UTF-8. The generated java source code will also be encoded in UTF-8 as mentioned in the mwe2 file.
Our problem occures exactly when a generated artifact is generated and stored the first time (no file exists before).
In this case the special characters will be in the wrong codebase or replaced with Question Marks ('?') when they were 'to exotic'.
But if we make a zero-change in the dsl file afterwards (e.g. adding and removing a blank) which marks it dirty to store it again (using the eclipse editor generaded by xtext for our dsl) the generated artefact file will be correct(ed).
The same happens when we trigger "Project / Clean.." on the eclipse project containing the dsl file.
Example in DSL:
package example {
BoRoot WithUmlautÄ {
}
}
Initially generated Code:
public interface WithUmlaut� extends BoRoot {
}
Code generated when the File already existed before:
public interface WithUmlautÄ extends BoRoot {
}
In the first (wrong) example the Letter Ä was encoded by the single byte C4 (ISO 8859-1), in the second (working) version as double byte C3 84 (UTF-8)
Also deleting the generated java file and storing the dsl file again causes the error.
Second Problem (xtend):
We have a similar Problem using native xtend together with GIT. When we write a xtend class with code using e.g. the german sz (ß) like:
class Test {
def test() {
val straße = ""
}
}
Storing the class the first time it works without problem. When we commit this xtend class to GIT and an other member pulls the code the first time, the java classes generated by xtend have the same codepage problem. Also here the zero change and store triggers the xtend generator - but now on an existing file - and the codepage problems disappear.
In this case the" project clean" usecase did not corrupt the generated java artefacts, but deleting the java artefact and store the xtend file again also causes the same error like in our DSL.
Any ideas what the problem could be?
Thanks,
Tom
|
|
|
Re: Codepage problem [message #1791572 is a reply to message #1791363] |
Mon, 02 July 2018 15:02 |
|
can you please provide a complete example?
i assume that at some place the encoding providers are not working correct.
did you debug WorkspaceEncodingProvider and org.eclipse.xtext.builder.EclipseResourceFileSystemAccess2.getEncoding(IFile)
to find out where the wrong encoding comes from
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
|
|
|
Re: Codepage problem [message #1791585 is a reply to message #1791572] |
Mon, 02 July 2018 16:11 |
Thomas Kohler Messages: 3 Registered: June 2018 |
Junior Member |
|
|
Hello Christian, thank you for your reply.
To see the problem you just need to take the current Photon Eclipse IDE for Java and DSL Developers. Run it, make a new Workspace. Then go to Windows/Preferences/Content Types, open the "Text" node, select "Java Source File", Enter UTF-8 in the Field "Default Encoding", press Update. Do the same with "Xtend File".
Now create a simple Java class like Adresse.java:
package test;
public class Adresse {
private String straße;
public String getStraße() {
return straße;
}
public void setStraße(String straße) {
this.straße = straße;
}
}
and a Xtend class like Test.xtend:
package test
class Test {
def test() {
val adresse = new Adresse
adresse.straße = "test"
}
}
Everything is OK now, both the java as the xtend source and the generated java code from xtend are UTF-8.
Now just clean the project (menu Project/Clean) and the encoding of the freshly generated java source from xtend is Cp1252 now (as the global default of the container e.g. Workspace). Because Java assumes UTF-8 (as set for *.java files before) it can't compile the java file. The error is also (correctly) propagated back to the xtend file.
If you make a zero change to the xtend source and save again, the generated file will be correctly encoded as UTF-8 again.
I think this demonstrates the main problem which also occures on own DSLs using the same generation process.
regards, Tom
|
|
|
|
Re: Codepage problem [message #1791590 is a reply to message #1791588] |
Mon, 02 July 2018 16:25 |
|
i could not reproduce that.
the generate file has the encoding of the workspace/project
and the stuff is correctly escaped
package sample
class Sampke {
def static void main(String[] args) {
Util.äöüß
}
}
package sample;
import sample.Util;
@SuppressWarnings("all")
public class Sampke {
public static void main(final String[] args) {
Util.\u00e4\u00f6\u00fc\u00df();
}
}
or when using cp1252and not us ascii
package sample;
import org.eclipse.xtext.xbase.lib.InputOutput;
import sample.Util;
@SuppressWarnings("all")
public class Sampke {
public static void main(final String[] args) {
Util.äöüß();
InputOutput.<Integer>println(Integer.valueOf(1));
}
}
// in cp1252
Twitter : @chrdietrich
Blog : https://www.dietrich-it.de
[Updated on: Mon, 02 July 2018 16:37] Report message to a moderator
|
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.04089 seconds