Specify character encoding for the Eclipse console [message #992937] |
Sun, 23 December 2012 03:01 |
Roger Howell Messages: 4 Registered: December 2012 |
Junior Member |
|
|
Howdy all,
I am experiencing issues with reading in non-Latin characters into the console (Java / Indigo service release 2).
I have created a "Short, Self Contained, Correct (Compilable), Example" (SSCCE) to demonstrate the issue (including comments of what I believe is happening at each stage in the code):
import java.io.*;
public class ReadInChineseCharactersSSCCE {
public static void main(String[] args) {
try
{
boolean isRunning = true;
//Raw flow of input data from the console
InputStream inputStream = System.in;
//Allows you to read the stream, using either the default character encoding, else the specified encoding;
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
//Adds functionality for converting the stream being read in, into Strings(?)
BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);
//Raw flow of outputdata to the console
OutputStream outputStream = System.out;
//Write a stream, from a given bit of text
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
//Adds functionality to the base ability to write to a stream
BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);
while(isRunning) {
System.out.println();//force extra newline
System.out.print("> ");
//To read in a line of text (as a String):
String userInput_asString = input_BufferedReader.readLine();
//To output a line of text:
String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
output_BufferedWriter.write(outputToUser_fromString_userSupplied);
output_BufferedWriter.flush();
System.out.println();//force extra newline
}
}
catch (Exception e) {
// TODO: handle exception
}
}
}
Sample output:
> 之謂甚
foo
之謂甚
之謂ç"š
> oaea
foo
之謂甚
oaea
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;
>
Presuming that the characters are displayed "correctly" (matching what I see in the console), you will note that the issue is not with the the output of non-latin characters (they display fine when hard-coded into the source). The issue is only seen when reading in Chinese characters.
I can observe the value being set to the "garbled" string by setting a breakpoint and stepping through. What is output matches what I see being set. Manually changing the variable value in the debugger has the same effect as hard-coding the value into the source (it is displayed correctly).
I have set the character encoding of the files / run configuration / buffered reader / buffered writer and a whole bunch of other places to UTF-8. I am able to reproduce the issue in a new workspace by setting up a new Java project, setting any/all character encoding settings to UTF-8 and running the above SSSCE.
Am I making a fundamental error in the way I understand what is happening in the code? Is there something in the code that I am missing / not understanding correctly? Is this potentially a bug with the Eclipse console / IDE? How would I go about diagnosing this / what extra information is needed?
Kind regards
|
|
|
Re: Specify character encoding for the Eclipse console [message #997601 is a reply to message #992937] |
Tue, 08 January 2013 12:58 |
Dani Megert Messages: 3802 Registered: July 2009 |
Senior Member |
|
|
On 23.12.2012 12:56, Roger Howell wrote:
> Howdy all,
>
> I am experiencing issues with reading in non-Latin characters into the
> console (Java / Indigo service release 2).
Please try Juno SR1 or newer. I suspect you run into
https://bugs.eclipse.org/382257 .
Dani
>
> I have created a "Short, Self Contained, Correct (Compilable),
> Example" (SSCCE) to demonstrate the issue (including comments of what
> I believe is happening at each stage in the code):
>
> import java.io.*;
>
> public class ReadInChineseCharactersSSCCE {
>
> public static void main(String[] args) { try {
> boolean isRunning = true;
>
> //Raw flow of input data from the console
> InputStream inputStream = System.in;
> //Allows you to read the stream, using either the default
> character encoding, else the specified encoding;
> InputStreamReader inputStreamReader = new
> InputStreamReader(inputStream, "UTF-8");
> //Adds functionality for converting the stream being read
> in, into Strings(?)
> BufferedReader input_BufferedReader = new
> BufferedReader(inputStreamReader);
>
>
> //Raw flow of outputdata to the console
> OutputStream outputStream = System.out;
> //Write a stream, from a given bit of text
> OutputStreamWriter outputStreamWriter = new
> OutputStreamWriter(outputStream, "UTF-8");
> //Adds functionality to the base ability to write to a stream
> BufferedWriter output_BufferedWriter = new
> BufferedWriter(outputStreamWriter);
>
>
>
> while(isRunning) {
> System.out.println();//force extra newline
> System.out.print("> ");
>
> //To read in a line of text (as a String):
> String userInput_asString =
> input_BufferedReader.readLine();
>
> //To output a line of text:
> String outputToUser_fromString_englishFromCode =
> "foo"; //outputs correctly
> output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> String outputToUser_fromString_ChineseFromCode = "之謂
> 甚"; //outputs correctly
> output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> String outputToUser_fromString_userSupplied =
> userInput_asString; //outputs correctly when given English text,
> garbled when given Chinese text
> output_BufferedWriter.write(outputToUser_fromString_userSupplied);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> }
> }
> catch (Exception e) {
> // TODO: handle exception
> }
> }
> }
>
>
> Sample output:
>> 之謂甚
> foo
> 之謂甚
> 之謂ç"š
>
>> oaea
> foo
> 之謂甚
> oaea
>
>> mixed input - English: fubar; Chinese: 之謂甚;
> foo
> 之謂甚
> mixed input - English: fubar; Chinese: 之謂ç"š;
>
>>
>
>
> Presuming that the characters are displayed "correctly" (matching what
> I see in the console), you will note that the issue is not with the
> the output of non-latin characters (they display fine when hard-coded
> into the source). The issue is only seen when reading in Chinese
> characters.
> I can observe the value being set to the "garbled" string by setting a
> breakpoint and stepping through. What is output matches what I see
> being set. Manually changing the variable value in the debugger has
> the same effect as hard-coding the value into the source (it is
> displayed correctly).
>
>
> I have set the character encoding of the files / run configuration /
> buffered reader / buffered writer and a whole bunch of other places to
> UTF-8. I am able to reproduce the issue in a new workspace by setting
> up a new Java project, setting any/all character encoding settings to
> UTF-8 and running the above SSSCE.
>
> Am I making a fundamental error in the way I understand what is
> happening in the code? Is there something in the code that I am
> missing / not understanding correctly? Is this potentially a bug with
> the Eclipse console / IDE? How would I go about diagnosing this / what
> extra information is needed?
>
> Kind regards
|
|
|
Re: Specify character encoding for the Eclipse console [message #998678 is a reply to message #997601] |
Thu, 10 January 2013 13:50 |
Roger Howell Messages: 4 Registered: December 2012 |
Junior Member |
|
|
Hi Dani,
Good spot! That bug does indeed look like the bug I am running into. The bad news is that it does not appear to resolve my issue - I am seeing what appear to be identical results in Juno SR1 to those I describe above.
From a fresh download this afternoon [1] I initially tested the above example in the existing workspace/project that I had created previously to test this issue. This did not work.
So, to try and identify if it is a configuration issue in the existing workspace, I then created a new workspace/new project/new class etc (including amending the character encodings in Eclipse), using the code from the first post within the new class. This also did not work.
Any chance you could run the above code in Juno SR1 to help identify whether this is a problem on my end, else confirm that this issue is reproducible?
Cheers for your help!
Roger
[1] - http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-jee-juno-SR1-win32-x86_64.zip
Dani Megert wrote on Tue, 08 January 2013 07:58On 23.12.2012 12:56, Roger Howell wrote:
> Howdy all,
>
> I am experiencing issues with reading in non-Latin characters into the
> console (Java / Indigo service release 2).
Please try Juno SR1 or newer. I suspect you run into
https://bugs.eclipse.org/382257 .
Dani
[Updated on: Thu, 10 January 2013 13:58] Report message to a moderator
|
|
|
|
Re: Specify character encoding for the Eclipse console [message #998744 is a reply to message #998734] |
Thu, 10 January 2013 15:56 |
Roger Howell Messages: 4 Registered: December 2012 |
Junior Member |
|
|
Apologies, I did not explicitly state this.
Expected Output (user input via console = "mixed input - English: fubar; Chinese: 之謂甚;"):
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;
Actual (buggy) Output (user input via console = "mixed input - English: fubar; Chinese: 之謂甚;"):
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;
The first line is the input line, contining "> " and any user input.
The second line is an example a "normal" string being output by Java.
The third line is an example of printing out a hard-coded value with Chinese characters.
The fourth and final line is Java spitting out whatever it is that you entered as user-input on line 1.
[Updated on: Thu, 10 January 2013 15:58] Report message to a moderator
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Powered by
FUDForum. Page generated in 0.05376 seconds