Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Eclipse Platform » Specify character encoding for the Eclipse console(Is this programmer error or an Eclipse bug?)
Specify character encoding for the Eclipse console [message #992937] Sun, 23 December 2012 03:01 Go to next message
Roger Howell is currently offline Roger HowellFriend
Messages: 4
Registered: December 2012
Junior Member
Howdy all,

I am experiencing issues with reading in non-Latin characters into the console (Java / Indigo service release 2).


I have created a "Short, Self Contained, Correct (Compilable), Example" (SSCCE) to demonstrate the issue (including comments of what I believe is happening at each stage in the code):

import java.io.*;

public class ReadInChineseCharactersSSCCE {

	public static void main(String[] args) {    
	    try 
	    {
	        boolean isRunning = true;

	        //Raw flow of input data from the console
	        InputStream inputStream = System.in;
	        //Allows you to read the stream, using either the default character encoding, else the specified encoding;
	        InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
	        //Adds functionality for converting the stream being read in, into Strings(?)
	        BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);


	        //Raw flow of outputdata to the console
	        OutputStream outputStream = System.out;
	        //Write a stream, from a given bit of text
	        OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
	        //Adds functionality to the base ability to write to a stream
	        BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);



	        while(isRunning) {
	            System.out.println();//force extra newline
	            System.out.print("> ");

	            //To read in a line of text (as a String):
	            String userInput_asString = input_BufferedReader.readLine();

	            //To output a line of text:
	            String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
	            output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	            String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
	            output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	            String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
	            output_BufferedWriter.write(outputToUser_fromString_userSupplied);
	            output_BufferedWriter.flush();

	            System.out.println();//force extra newline

	        }
	    }
	    catch (Exception e) {
	        // TODO: handle exception
	    }
	}
}


Sample output:
> 之謂甚
foo
之謂甚
之謂ç"š

> oaea
foo
之謂甚
oaea

> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;

> 



Presuming that the characters are displayed "correctly" (matching what I see in the console), you will note that the issue is not with the the output of non-latin characters (they display fine when hard-coded into the source). The issue is only seen when reading in Chinese characters.

I can observe the value being set to the "garbled" string by setting a breakpoint and stepping through. What is output matches what I see being set. Manually changing the variable value in the debugger has the same effect as hard-coding the value into the source (it is displayed correctly).


I have set the character encoding of the files / run configuration / buffered reader / buffered writer and a whole bunch of other places to UTF-8. I am able to reproduce the issue in a new workspace by setting up a new Java project, setting any/all character encoding settings to UTF-8 and running the above SSSCE.

Am I making a fundamental error in the way I understand what is happening in the code? Is there something in the code that I am missing / not understanding correctly? Is this potentially a bug with the Eclipse console / IDE? How would I go about diagnosing this / what extra information is needed?

Kind regards
Re: Specify character encoding for the Eclipse console [message #997601 is a reply to message #992937] Tue, 08 January 2013 12:58 Go to previous messageGo to next message
Dani Megert is currently offline Dani MegertFriend
Messages: 3801
Registered: July 2009
Senior Member
On 23.12.2012 12:56, Roger Howell wrote:
> Howdy all,
>
> I am experiencing issues with reading in non-Latin characters into the
> console (Java / Indigo service release 2).
Please try Juno SR1 or newer. I suspect you run into
https://bugs.eclipse.org/382257 .

Dani
>
> I have created a "Short, Self Contained, Correct (Compilable),
> Example" (SSCCE) to demonstrate the issue (including comments of what
> I believe is happening at each stage in the code):
>
> import java.io.*;
>
> public class ReadInChineseCharactersSSCCE {
>
> public static void main(String[] args) { try {
> boolean isRunning = true;
>
> //Raw flow of input data from the console
> InputStream inputStream = System.in;
> //Allows you to read the stream, using either the default
> character encoding, else the specified encoding;
> InputStreamReader inputStreamReader = new
> InputStreamReader(inputStream, "UTF-8");
> //Adds functionality for converting the stream being read
> in, into Strings(?)
> BufferedReader input_BufferedReader = new
> BufferedReader(inputStreamReader);
>
>
> //Raw flow of outputdata to the console
> OutputStream outputStream = System.out;
> //Write a stream, from a given bit of text
> OutputStreamWriter outputStreamWriter = new
> OutputStreamWriter(outputStream, "UTF-8");
> //Adds functionality to the base ability to write to a stream
> BufferedWriter output_BufferedWriter = new
> BufferedWriter(outputStreamWriter);
>
>
>
> while(isRunning) {
> System.out.println();//force extra newline
> System.out.print("> ");
>
> //To read in a line of text (as a String):
> String userInput_asString =
> input_BufferedReader.readLine();
>
> //To output a line of text:
> String outputToUser_fromString_englishFromCode =
> "foo"; //outputs correctly
> output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> String outputToUser_fromString_ChineseFromCode = "之謂
> 甚"; //outputs correctly
> output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> String outputToUser_fromString_userSupplied =
> userInput_asString; //outputs correctly when given English text,
> garbled when given Chinese text
> output_BufferedWriter.write(outputToUser_fromString_userSupplied);
> output_BufferedWriter.flush();
>
> System.out.println();//force extra newline
>
> }
> }
> catch (Exception e) {
> // TODO: handle exception
> }
> }
> }
>
>
> Sample output:
>> 之謂甚
> foo
> 之謂甚
> 之謂ç"š
>
>> oaea
> foo
> 之謂甚
> oaea
>
>> mixed input - English: fubar; Chinese: 之謂甚;
> foo
> 之謂甚
> mixed input - English: fubar; Chinese: 之謂ç"š;
>
>>
>
>
> Presuming that the characters are displayed "correctly" (matching what
> I see in the console), you will note that the issue is not with the
> the output of non-latin characters (they display fine when hard-coded
> into the source). The issue is only seen when reading in Chinese
> characters.
> I can observe the value being set to the "garbled" string by setting a
> breakpoint and stepping through. What is output matches what I see
> being set. Manually changing the variable value in the debugger has
> the same effect as hard-coding the value into the source (it is
> displayed correctly).
>
>
> I have set the character encoding of the files / run configuration /
> buffered reader / buffered writer and a whole bunch of other places to
> UTF-8. I am able to reproduce the issue in a new workspace by setting
> up a new Java project, setting any/all character encoding settings to
> UTF-8 and running the above SSSCE.
>
> Am I making a fundamental error in the way I understand what is
> happening in the code? Is there something in the code that I am
> missing / not understanding correctly? Is this potentially a bug with
> the Eclipse console / IDE? How would I go about diagnosing this / what
> extra information is needed?
>
> Kind regards
Re: Specify character encoding for the Eclipse console [message #998678 is a reply to message #997601] Thu, 10 January 2013 13:50 Go to previous messageGo to next message
Roger Howell is currently offline Roger HowellFriend
Messages: 4
Registered: December 2012
Junior Member
Hi Dani,

Good spot! That bug does indeed look like the bug I am running into. The bad news is that it does not appear to resolve my issue - I am seeing what appear to be identical results in Juno SR1 to those I describe above. Sad


From a fresh download this afternoon [1] I initially tested the above example in the existing workspace/project that I had created previously to test this issue. This did not work.

So, to try and identify if it is a configuration issue in the existing workspace, I then created a new workspace/new project/new class etc (including amending the character encodings in Eclipse), using the code from the first post within the new class. This also did not work.


Any chance you could run the above code in Juno SR1 to help identify whether this is a problem on my end, else confirm that this issue is reproducible?

Cheers for your help!
Roger



[1] - http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-jee-juno-SR1-win32-x86_64.zip

Dani Megert wrote on Tue, 08 January 2013 07:58
On 23.12.2012 12:56, Roger Howell wrote:
> Howdy all,
>
> I am experiencing issues with reading in non-Latin characters into the
> console (Java / Indigo service release 2).
Please try Juno SR1 or newer. I suspect you run into
https://bugs.eclipse.org/382257 .

Dani

[Updated on: Thu, 10 January 2013 13:58]

Report message to a moderator

Re: Specify character encoding for the Eclipse console [message #998734 is a reply to message #998678] Thu, 10 January 2013 15:23 Go to previous messageGo to next message
Dani Megert is currently offline Dani MegertFriend
Messages: 3801
Registered: July 2009
Senior Member
On 10.01.2013 14:50, Roger Howell wrote:
> Hi Dani,
>
> Good spot! That bug does indeed look like the bug I am running into.
> The bad news is that it does not appear to resolve my issue - I am
> seeing what appear to be identical results in Juno SR1 to those I
> describe above. :(
>
>
> From a fresh download this afternoon [1] I initially tested the above
> example in the existing workspace/project that I had created
> previously to test this issue (cos I'm lazy and didn't want to go
> through the re-setting of character encodings all over the place).
> This did not work.
>
> So I then created a new workspace/new project/new class etc and
> copy/pasted the code from the first post to try and identify if it is
> a configuration issue in the existing workspace. This also did not work.
>
>
> Any chance you could run the above code in Juno SR1 to help identify
> whether this is a problem on my end, else confirm that this issue is
> reproducible?
The code reads input. What exactly do I have to type there and what
would be your expected output?

Dani
>
> Cheers for your help! Roger
>
>
>
> [1] -
> http://www.eclipse.org/downloads/download.php?file=/technology/epp/downloads/release/juno/SR1/eclipse-jee-juno-SR1-win32-x86_64.zip
>
> Dani Megert wrote on Tue, 08 January 2013 07:58
>> On 23.12.2012 12:56, Roger Howell wrote:
>> > Howdy all,
>> >
>> > I am experiencing issues with reading in non-Latin characters into
>> the > console (Java / Indigo service release 2).
>> Please try Juno SR1 or newer. I suspect you run into
>> https://bugs.eclipse.org/382257 .
>>
>> Dani
>
>
Re: Specify character encoding for the Eclipse console [message #998744 is a reply to message #998734] Thu, 10 January 2013 15:56 Go to previous messageGo to next message
Roger Howell is currently offline Roger HowellFriend
Messages: 4
Registered: December 2012
Junior Member
Apologies, I did not explicitly state this.

Expected Output (user input via console = "mixed input - English: fubar; Chinese: 之謂甚;"):
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;


Actual (buggy) Output (user input via console = "mixed input - English: fubar; Chinese: 之謂甚;"):
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂ç"š;


The first line is the input line, contining "> " and any user input.
The second line is an example a "normal" string being output by Java.
The third line is an example of printing out a hard-coded value with Chinese characters.
The fourth and final line is Java spitting out whatever it is that you entered as user-input on line 1.

[Updated on: Thu, 10 January 2013 15:58]

Report message to a moderator

Re: Specify character encoding for the Eclipse console [message #998752 is a reply to message #998744] Thu, 10 January 2013 16:17 Go to previous messageGo to next message
Roger Howell is currently offline Roger HowellFriend
Messages: 4
Registered: December 2012
Junior Member
These screenshots of the debugger may be of interest.


This first one (IOTest_Juno - Input.png) shows the reader assigning the value of
userInput_asString

... immediately following this line:
String userInput_asString = input_BufferedReader.readLine();


index.php/fa/12936/0/


This second screenshot (IOTest_Juno - Output.png) shows the value of
outputToUser_fromString_userSupplied

... after it has been output to the console:

index.php/fa/12937/0/

[Updated on: Thu, 10 January 2013 16:26]

Report message to a moderator

Re: Specify character encoding for the Eclipse console [message #999803 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999808 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999812 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999816 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999820 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999825 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999829 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999834 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999839 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #999844 is a reply to message #998752] Sun, 13 January 2013 06:22 Go to previous messageGo to next message
Toshihiro Izumi is currently offline Toshihiro IzumiFriend
Messages: 359
Registered: July 2009
Location: Japan
Senior Member
The bugfix of https://bugs.eclipse.org/382257 is being applied to Kepler(4.3M4), but it is not being applied to Juno(3.8.1/4.2.1/M20130109-1200)...
Re: Specify character encoding for the Eclipse console [message #1000182 is a reply to message #999844] Mon, 14 January 2013 07:52 Go to previous message
Dani Megert is currently offline Dani MegertFriend
Messages: 3801
Registered: July 2009
Senior Member
On 13.01.2013 07:22, Toshihiro Izumi wrote:
> The bugfix of https://bugs.eclipse.org/382257 is being applied to
> Kepler(4.3M4), but it is not being applied to
> Juno(3.8.1/4.2.1/M20130109-1200)...
Correct.Roger, does it work if you use 4.3 M4?
http://www.eclipse.org/downloads/index-developer.php

Dani
Previous Topic:Key Binding for Dynamic Context Menus
Next Topic:Forced closed when ctrl+clic
Goto Forum:
  


Current Time: Sun Dec 21 23:18:59 GMT 2014

Powered by FUDForum. Page generated in 0.02221 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software