Codepage problem [message #1791363] |
Thu, 28 June 2018 07:58  |
Eclipse User |
|
|
|
Hello,
We have two (codepage?) problems when artefacts like java-code are generated the first time out of a xtext / xbase based dsl and also with native xtend.
First Problem (in our DSL):
To allow our customers to write DSL code in their native language we use UTF-8 encoding in all related files.
In Eclipse we changed the Content Type of all *.java, *.xtend, *.ext, *.xpt and also our DSL *.4gl files to UTF-8.
In our xtext grammar we override the following lexical rule:
[...]
@Override
terminal ID:
'^'? ID_START_CHAR ID_CHAR*;
terminal fragment ID_START_CHAR
: 'A'..'Z'
| 'a'..'z'
| '_'
| '\u00C0'..'\u00D6'
| '\u00D8'..'\u00F6'
| '\u00F8'..'\u02FF'
| '\u0370'..'\u037D'
| '\u037F'..'\u1FFF'
| '\u200C'..'\u200D'
| '\u2070'..'\u218F'
| '\u2C00'..'\u2FEF'
| '\u3001'..'\uD7FF'
| '\uF900'..'\uFDCF'
| '\uFDF0'..'\uFFFD'
// ignores | ['\u10000-'\uEFFFF]
;
terminal fragment ID_CHAR
: ID_START_CHAR
| '0'..'9'
| '\u00B7'
| '\u0300'..'\u036F'
| '\u203F'..'\u2040'
;
[...]
This allows the usage of all characters which are also allowed in (UTF-8) Java as identifiers in the DSL.
In the Model Workflow we defined the following parts to ensure UTF-8:
[...]
Workflow {
component = XtextGenerator {
configuration = {
[...]
code = {
encoding = "utf-8"
[...]
}
}
language = StandardLanguage {
[...]
fileExtensions = "4gl"
[...]
}
}
}
When we write a new DSL file resource, it will be stored as UTF-8. The generated java source code will also be encoded in UTF-8 as mentioned in the mwe2 file.
Our problem occures exactly when a generated artifact is generated and stored the first time (no file exists before).
In this case the special characters will be in the wrong codebase or replaced with Question Marks ('?') when they were 'to exotic'.
But if we make a zero-change in the dsl file afterwards (e.g. adding and removing a blank) which marks it dirty to store it again (using the eclipse editor generaded by xtext for our dsl) the generated artefact file will be correct(ed).
The same happens when we trigger "Project / Clean.." on the eclipse project containing the dsl file.
Example in DSL:
package example {
BoRoot WithUmlautÄ {
}
}
Initially generated Code:
public interface WithUmlaut� extends BoRoot {
}
Code generated when the File already existed before:
public interface WithUmlautÄ extends BoRoot {
}
In the first (wrong) example the Letter Ä was encoded by the single byte C4 (ISO 8859-1), in the second (working) version as double byte C3 84 (UTF-8)
Also deleting the generated java file and storing the dsl file again causes the error.
Second Problem (xtend):
We have a similar Problem using native xtend together with GIT. When we write a xtend class with code using e.g. the german sz (ß) like:
class Test {
def test() {
val straße = ""
}
}
Storing the class the first time it works without problem. When we commit this xtend class to GIT and an other member pulls the code the first time, the java classes generated by xtend have the same codepage problem. Also here the zero change and store triggers the xtend generator - but now on an existing file - and the codepage problems disappear.
In this case the" project clean" usecase did not corrupt the generated java artefacts, but deleting the java artefact and store the xtend file again also causes the same error like in our DSL.
Any ideas what the problem could be?
Thanks,
Tom
|
|
|
|
Re: Codepage problem [message #1791585 is a reply to message #1791572] |
Mon, 02 July 2018 12:11   |
Eclipse User |
|
|
|
Hello Christian, thank you for your reply.
To see the problem you just need to take the current Photon Eclipse IDE for Java and DSL Developers. Run it, make a new Workspace. Then go to Windows/Preferences/Content Types, open the "Text" node, select "Java Source File", Enter UTF-8 in the Field "Default Encoding", press Update. Do the same with "Xtend File".
Now create a simple Java class like Adresse.java:
package test;
public class Adresse {
private String straße;
public String getStraße() {
return straße;
}
public void setStraße(String straße) {
this.straße = straße;
}
}
and a Xtend class like Test.xtend:
package test
class Test {
def test() {
val adresse = new Adresse
adresse.straße = "test"
}
}
Everything is OK now, both the java as the xtend source and the generated java code from xtend are UTF-8.
Now just clean the project (menu Project/Clean) and the encoding of the freshly generated java source from xtend is Cp1252 now (as the global default of the container e.g. Workspace). Because Java assumes UTF-8 (as set for *.java files before) it can't compile the java file. The error is also (correctly) propagated back to the xtend file.
If you make a zero change to the xtend source and save again, the generated file will be correctly encoded as UTF-8 again.
I think this demonstrates the main problem which also occures on own DSLs using the same generation process.
regards, Tom
|
|
|
|
|
|
|
Re: Codepage problem [message #1791593 is a reply to message #1791592] |
Mon, 02 July 2018 12:48  |
Eclipse User |
|
|
|
workaround:
subclass and bind org.eclipse.xtext.builder.EclipseResourceFileSystemAccess2.getEncoding(IFile)
and use org.eclipse.core.runtime.Platform .getContentTypeManager().findContentTypeFor(file.getName()).getDefaultCharset()
|
|
|
Powered by
FUDForum. Page generated in 0.08325 seconds