Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Eclipse Platform » Specify character encoding for the Eclipse console (Is this programmer error or an Eclipse bug?)
Specify character encoding for the Eclipse console [message #992937] Sat, 22 December 2012 22:01 Go to previous message
Roger Howell is currently offline Roger Howell
Messages: 4
Registered: December 2012
Junior Member
Howdy all,

I am experiencing issues with reading in non-Latin characters into the console (Java / Indigo service release 2).


I have created a "Short, Self Contained, Correct (Compilable), Example" (SSCCE) to demonstrate the issue (including comments of what I believe is happening at each stage in the code):

import java.io.*;

public class ReadInChineseCharactersSSCCE {

	public static void main(String[] args) {    
	    try 
	    {
	        boolean isRunning = true;

	        //Raw flow of input data from the console
	        InputStream inputStream = System.in;
	        //Allows you to read the stream, using either the default character encoding, else the specified encoding;
	        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
	        //Adds functionality for converting the stream being read in, into Strings(?)
	        BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);


	        //Raw flow of outputdata to the console
	        OutputStream outputStream = System.out;
	        //Write a stream, from a given bit of text
	        OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
	        //Adds functionality to the base ability to write to a stream
	        BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);



	        while(isRunning) {
	            System.out.println();//force extra newline
	            System.out.print("> ");

	            //To read in a line of text (as a String):
	            String userInput_asString = input_BufferedReader.readLine();

	            //To output a line of text:
	            String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
	            output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	            String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
	            output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	            String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
	            output_BufferedWriter.write(outputToUser_fromString_userSupplied);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	        }
	    }
	    catch (Exception e) {
	        // TODO: handle exception
	    }
	}
}


Sample output:
> 之謂甚
foo
之謂甚
之謂ç"š

> oaea
foo
之謂甚
oaea

> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;

> 



Presuming that the characters are displayed "correctly" (matching what I see in the console), you will note that the issue is not with the the output of non-latin characters (they display fine when hard-coded into the source). The issue is only seen when reading in Chinese characters.

I can observe the value being set to the "garbled" string by setting a breakpoint and stepping through. What is output matches what I see being set. Manually changing the variable value in the debugger has the same effect as hard-coding the value into the source (it is displayed correctly).


I have set the character encoding of the files / run configuration / buffered reader / buffered writer and a whole bunch of other places to UTF-8. I am able to reproduce the issue in a new workspace by setting up a new Java project, setting any/all character encoding settings to UTF-8 and running the above SSSCE.

Am I making a fundamental error in the way I understand what is happening in the code? Is there something in the code that I am missing / not understanding correctly? Is this potentially a bug with the Eclipse console / IDE? How would I go about diagnosing this / what extra information is needed?

Kind regards
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic:Key Binding for Dynamic Context Menus
Next Topic:Forced closed when ctrl+clic
Goto Forum:
  


Current Time: Sat May 18 08:12:27 EDT 2013

Powered by FUDForum. Page generated in 0.02856 seconds