Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[equinox-dev] Equinox and UTF-8

Hi,

I am in struggle with UTF-8 encoding of strings while using Equinox 3.4
and now seeking for some help.

I have the following code as example, which encodes a string
in two different ways into a byte representation. First by cast, then
with usage of String.getBytes(). The code is used inside a bundle.

=== cut ===
        String data = "§";

        byte[] dataBytes = data.getBytes();
        System.out.println(data+" length() = "+data.length());

        System.out.print(data+" cast to byte = ");
        for (int i=0; i<data.length(); i++)
            System.out.print((byte)data.charAt(i)+" ");

        System.out.print("\r\n"+data+" getBytes() = ");

        for (int i=0; i<dataBytes.length; i++)
            System.out.print(dataBytes[i]+" ");

        System.out.println();
=== cut ===

Executing this inside Eclipse as part of an OSGi framework leads to:

=== cut ===
§ length() = 1
§ cast to byte = -89
§ getBytes() = -62 -89
===  cut ===

The result is the same, when starting this as a Java application
inside Eclipse.

When running the same code inside the Equinox framework on a command
shell using the
following command line

# java -Dfile.encoding=UTF-8
-Dosgi.bundles=reference\:file\:com.example.utf8_1.0.0.jar@start -jar
org.eclipse.osgi_3.4.0.v20080605-1900.jar -console -clean

it prints

=== cut ===
+é-º length() = 2
+é-º cast to byte = -62 -89
+é-º getBytes() = -61 -126 -62 -89
=== cut ===

Executing the code on a command shell as normal Java application leads
then to:

=== cut ===
-º length() = 1
-º cast to byte = -89
-º getBytes() = -62 -89
=== cut ===

In all cases I am always trying to encode a paragraph symbol. The output
is distorted because of the command shell lacking UTF-8 support.

What is the cause for the different results? What is the proper way to
get always the same results, irrespective of the execution inside or
outside of Eclipse? What must be done to have fully UTF-8 support as
default when using getBytes() and so on?

Software used: Windows XP, Eclipse 3.4, Equinox 3.4

Thanks for your help,

Holger Mense

-- 
Holger Mense                                http://www.holger-mense.de

Attachment: signature.asc
Description: OpenPGP digital signature


Back to the top