Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Eclipse Platform » .html files always Shift_JIS regardless of preferences or content
.html files always Shift_JIS regardless of preferences or content [message #334193] Mon, 26 January 2009 02:36 Go to next message
Ed Wright is currently offline Ed Wright
Messages: 11
Registered: July 2009
Junior Member
Eclipse 3.4 on Linux:

If I create a new *empty* .html file, it is always created with a file
encoding of Shift_JIS, "determined from content".

If I copy an existing html snippet (no headers), which has UTF-8
content, for which I have manually set the encoding to UTF-8, to another
location, the encoding is reset to Shift_JIS.

An annoying side note is that when I do reset the encoding type, Eclipse
tells me UTF-8 conflicts with the content type *even if the file is
empty or contains UTF-8 content*.

This is a major headache for me, apart from the need to manually reset
the content type on every such file, in that editing and saving a file
results in permanently corrupted Japanese text.

settings:
---------
env var LANG = ja_JP.utf8

parent folder text file encoding set to UTF-8

preferences:web:html files encoding set to UTF-8

preferences:general:content types:text:html default encoding for all
file associations set to UTF-8

Am I missing something somewhere? Is this a bug? (Ain't no feature in my
book :) )

Thanks,
Ed
Re: .html files always Shift_JIS regardless of preferences or content [message #334196 is a reply to message #334193] Mon, 26 January 2009 06:45 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 25995
Registered: July 2009
Senior Member
Ed,

Comments below.

Ed Wright wrote:
> Eclipse 3.4 on Linux:
>
> If I create a new *empty* .html file, it is always created with a file
> encoding of Shift_JIS, "determined from content".
How do you create it? Are you using WTP's editor or just Eclipse's text
editor (which isn't XML aware)?
>
> If I copy an existing html snippet (no headers), which has UTF-8 content
Once you're working with text, you're working with characters (or
codepoints) where there is simply no such notion as it being UTF-8 which
is a concept that applies to the manner in which those characters are
encoded as bytes.
> , for which I have manually set the encoding to UTF-8, to another
> location, the encoding is reset to Shift_JIS.
I don't get this comment. The characters should be copied into the
editor without affecting the encoding the editor remembers. That's not
the case?
>
> An annoying side note is that when I do reset the encoding type,
How do you do that?
> Eclipse tells me UTF-8 conflicts with the content type *even if the
> file is empty or contains UTF-8 content*.
An empty file could contain a byte order marker that would be hard to
notice...
>
> This is a major headache for me, apart from the need to manually reset
> the content type on every such file, in that editing and saving a file
> results in permanently corrupted Japanese text.
>
> settings:
> ---------
> env var LANG = ja_JP.utf8
>
> parent folder text file encoding set to UTF-8
>
> preferences:web:html files encoding set to UTF-8
>
> preferences:general:content types:text:html default encoding for all
> file associations set to UTF-8
>
> Am I missing something somewhere? Is this a bug? (Ain't no feature in
> my book :) )
It might well be, but if this is a question about WTP's editor, it's
better to ask there on their newsgroup.
>
> Thanks,
> Ed
Re: .html files always Shift_JIS regardless of preferences or content [message #334198 is a reply to message #334196] Mon, 26 January 2009 08:37 Go to previous messageGo to next message
Ed Wright is currently offline Ed Wright
Messages: 11
Registered: July 2009
Junior Member
Hi Ed,

Also comments below...

On 2009/01/26 20:45, Ed Merks wrote:
>> Eclipse 3.4 on Linux:
>>
>> If I create a new *empty* .html file, it is always created with a file
>> encoding of Shift_JIS, "determined from content".
> How do you create it? Are you using WTP's editor or just Eclipse's text
> editor (which isn't XML aware)?

Doesn't matter. I can "touch test.html" from the linux command line,
creating a stone empty file; "refresh" in Eclipse; and the file is
marked Shift_JIS. However, generally, I use Eclipse Navigator's New/ File.

>> If I copy an existing html snippet (no headers), which has UTF-8 content
> Once you're working with text, you're working with characters (or
> codepoints) where there is simply no such notion as it being UTF-8 which
> is a concept that applies to the manner in which those characters are
> encoded as bytes.

Understood. The main point I wanted to make was that I manually set the
encoding to UTF-8 on the original file, but when I used copy/paste
within Eclipse, the encoding gets changed to Shift_JIS. I would expect a
file copy (within Navigator) to copy properties as well as contents.

>> , for which I have manually set the encoding to UTF-8, to another
>> location, the encoding is reset to Shift_JIS.
> I don't get this comment. The characters should be copied into the
> editor without affecting the encoding the editor remembers. That's not
> the case?

I'm not copy/pasting file contents from file to file, rather I am
copy/pasting the file itself from directory A to directory B.

The actual contents are correctly/accurately copied. The problem is that
Eclipse now thinks the file is Shift_JIS encoded and will open and edit
and save the file as Shift_JIS unless I manually reset the encoding.

>>
>> An annoying side note is that when I do reset the encoding type,
> How do you do that?

In the "Navigator" I right click the file name, select "properties"
which displays "Resource" where the file encoding is set/settable.

>> Eclipse tells me UTF-8 conflicts with the content type *even if the
>> file is empty or contains UTF-8 content*.
> An empty file could contain a byte order marker that would be hard to
> notice...

Could, but doesn't :) - See my first note above.

>> Am I missing something somewhere? Is this a bug? (Ain't no feature in
>> my book :) )
> It might well be, but if this is a question about WTP's editor, it's
> better to ask there on their newsgroup.

I wouldn't *think* it would be an editor issue as the encoding seems to
be set regardless or what editor I use. However, it *does* seem to be
specific to .html files. (If I repeat the procedure in my first note
above, only touch "test.txt" instead, the file is marked as UTF-8
encoded, as expected.)

Any additional thoughts greatly appreciated. And if you think it is
likely to be an editor issue, I'll repost over there. (Would that be
eclipse.webtools?)

Thanks again.
Ed
Re: .html files always Shift_JIS regardless of preferences or content [message #334204 is a reply to message #334198] Mon, 26 January 2009 11:02 Go to previous messageGo to next message
Ed Merks is currently offline Ed Merks
Messages: 25995
Registered: July 2009
Senior Member
Ed,

Yes, if it's specific to the WTP editor, I'd ask on eclipse.webtools.
Certainly for XML files, I'd expect the content type to be determined by
the XML header...


Ed Wright wrote:
> Hi Ed,
>
> Also comments below...
>
> On 2009/01/26 20:45, Ed Merks wrote:
>>> Eclipse 3.4 on Linux:
>>>
>>> If I create a new *empty* .html file, it is always created with a
>>> file encoding of Shift_JIS, "determined from content".
>> How do you create it? Are you using WTP's editor or just Eclipse's
>> text editor (which isn't XML aware)?
>
> Doesn't matter. I can "touch test.html" from the linux command line,
> creating a stone empty file; "refresh" in Eclipse; and the file is
> marked Shift_JIS. However, generally, I use Eclipse Navigator's New/
> File.
>
>>> If I copy an existing html snippet (no headers), which has UTF-8
>>> content
>> Once you're working with text, you're working with characters (or
>> codepoints) where there is simply no such notion as it being UTF-8
>> which is a concept that applies to the manner in which those
>> characters are encoded as bytes.
>
> Understood. The main point I wanted to make was that I manually set
> the encoding to UTF-8 on the original file, but when I used copy/paste
> within Eclipse, the encoding gets changed to Shift_JIS. I would expect
> a file copy (within Navigator) to copy properties as well as contents.
>
>>> , for which I have manually set the encoding to UTF-8, to another
>>> location, the encoding is reset to Shift_JIS.
>> I don't get this comment. The characters should be copied into the
>> editor without affecting the encoding the editor remembers. That's
>> not the case?
>
> I'm not copy/pasting file contents from file to file, rather I am
> copy/pasting the file itself from directory A to directory B.
>
> The actual contents are correctly/accurately copied. The problem is
> that Eclipse now thinks the file is Shift_JIS encoded and will open
> and edit and save the file as Shift_JIS unless I manually reset the
> encoding.
>
>>>
>>> An annoying side note is that when I do reset the encoding type,
>> How do you do that?
>
> In the "Navigator" I right click the file name, select "properties"
> which displays "Resource" where the file encoding is set/settable.
>
>>> Eclipse tells me UTF-8 conflicts with the content type *even if the
>>> file is empty or contains UTF-8 content*.
>> An empty file could contain a byte order marker that would be hard to
>> notice...
>
> Could, but doesn't :) - See my first note above.
>
>>> Am I missing something somewhere? Is this a bug? (Ain't no feature
>>> in my book :) )
>> It might well be, but if this is a question about WTP's editor, it's
>> better to ask there on their newsgroup.
>
> I wouldn't *think* it would be an editor issue as the encoding seems
> to be set regardless or what editor I use. However, it *does* seem to
> be specific to .html files. (If I repeat the procedure in my first
> note above, only touch "test.txt" instead, the file is marked as UTF-8
> encoded, as expected.)
>
> Any additional thoughts greatly appreciated. And if you think it is
> likely to be an editor issue, I'll repost over there. (Would that be
> eclipse.webtools?)
>
> Thanks again.
> Ed
Re: .html files always Shift_JIS regardless of preferences or content [message #334224 is a reply to message #334204] Tue, 27 January 2009 01:20 Go to previous message
Ed Wright is currently offline Ed Wright
Messages: 11
Registered: July 2009
Junior Member
On 2009/01/27 01:02, Ed Merks wrote:
> Ed,
>
> Yes, if it's specific to the WTP editor, I'd ask on eclipse.webtools.

Thanks. I'll post over there.

Ed
Previous Topic:cannot filter problem markers by description using the severityDescription field
Next Topic:Formatting a large number of files with Eclipse
Goto Forum:
  


Current Time: Wed Aug 20 02:53:38 EDT 2014

Powered by FUDForum. Page generated in 0.02790 seconds