| Specify character encoding for the Eclipse console [message #992937] |
Sat, 22 December 2012 22:01  |
Roger Howell Messages: 4 Registered: December 2012 |
Junior Member |
|
|
Howdy all,
I am experiencing issues with reading in non-Latin characters into the console (Java / Indigo service release 2).
I have created a "Short, Self Contained, Correct (Compilable), Example" (SSCCE) to demonstrate the issue (including comments of what I believe is happening at each stage in the code):
import java.io.*;
public class ReadInChineseCharactersSSCCE {
public static void main(String[] args) {
try
{
boolean isRunning = true;
//Raw flow of input data from the console
InputStream inputStream = System.in;
//Allows you to read the stream, using either the default character encoding, else the specified encoding;
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
//Adds functionality for converting the stream being read in, into Strings(?)
BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);
//Raw flow of outputdata to the console
OutputStream outputStream = System.out;
//Write a stream, from a given bit of text
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
//Adds functionality to the base ability to write to a stream
BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);
while(isRunning) {
System.out.println();//force extra newline
System.out.print("> ");
//To read in a line of text (as a String):
String userInput_asString = input_BufferedReader.readLine();
//To output a line of text:
String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
output_BufferedWriter.write(outputToUser_fromString_userSupplied);
output_BufferedWriter.flush();
System.out.println();//force extra newline
}
}
catch (Exception e) {
// TODO: handle exception
}
}
}
Sample output:
> 之謂甚
foo
之謂甚
之謂ç"š
> oaea
foo
之謂甚
oaea
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;
>
Presuming that the characters are displayed "correctly" (matching what I see in the console), you will note that the issue is not with the the output of non-latin characters (they display fine when hard-coded into the source). The issue is only seen when reading in Chinese characters.
I can observe the value being set to the "garbled" string by setting a breakpoint and stepping through. What is output matches what I see being set. Manually changing the variable value in the debugger has the same effect as hard-coding the value into the source (it is displayed correctly).
I have set the character encoding of the files / run configuration / buffered reader / buffered writer and a whole bunch of other places to UTF-8. I am able to reproduce the issue in a new workspace by setting up a new Java project, setting any/all character encoding settings to UTF-8 and running the above SSSCE.
Am I making a fundamental error in the way I understand what is happening in the code? Is there something in the code that I am missing / not understanding correctly? Is this potentially a bug with the Eclipse console / IDE? How would I go about diagnosing this / what extra information is needed?
Kind regards
|
|
|