[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?
|
I have added a paragraph to the Infocenter document in Plug-in Developer
Guide in the HEAD (3.0) stream.
Thank you Dan for your reporting and investigation of this issue.
Konrad Kolosowski
Eclipse Help System
Dorian
Birsan/Toronto/IBM@IBMCA To: platform-help-dev@xxxxxxxxxxx
Sent by: cc:
platform-help-dev-admin@ Subject: Re: [platform-help-dev] Is UTF-8 encoding assumed for all languages?
eclipse.org
05/08/2003 01:23 PM
Please respond to
platform-help-dev
Dan, this info is greatly appreciated and we will add it to the docs, as
suggested.
Thanks!
-Dorian
|---------+----------------------------------->
| | Dan |
| | Scott/Toronto/IBM@IBMCA |
| | Sent by: |
| | platform-help-dev-admin@|
| | eclipse.org |
| | |
| | |
| | 05/08/2003 01:02 PM |
| | Please respond to |
| | platform-help-dev |
| | |
|---------+----------------------------------->
>
-------------------------------------------------------------------------------------------------------------|
|
|
| To: platform-help-dev@xxxxxxxxxxx
|
| cc:
|
| Subject: Re: [platform-help-dev] Is UTF-8 encoding assumed for
all languages? |
|
|
|
|
>
-------------------------------------------------------------------------------------------------------------|
Here are my results from testing the default configurations of various HTTP
servers:
Apache HTTPd 1.3.26: BAD -- passes the charset=iso-8859-1 HTTP header; NL
content corrupted through proxy
Apache HTTPd 1.3.27: GOOD - does not pass a charset HTTP header; NL content
works through proxy
Apache HTTPd 2.0.45: BAD -- passes the charset=iso-8859-1 HTTP header; NL
content corrupted through proxy
IBM HTTP Server 1.3.19.2: BAD -- passes the charset=iso-8859-1 HTTP header;
NL content corrupted through proxy
IBM HTTP Server 1.3.26.1: GOOD -- passes the charset=UTF-8 HTTP header; NL
content works through proxy
I also tested some NL documentation that has been converted to UTF-8 and
which contains the correct <meta> element in the content. Unfortunately,
the iso-8859-1 statement passed by the "BAD" HTTP servers for the proxied
URL even corrupts that content.
Note that Apache 1.3.12 and up and Apache 2.0.x have a an httpd.conf
directive called AddDefaultCharset, which, if turned off or commented out,
will prevent the proxy from adding the charset statement to the headers
(and thereby enable NL content to be served up from a proxied URL
correctly).
In Apache 1.3.12+ (this also appears to work for IBM HTTP Server), set the
configuration directive to:
AddDefaultCharset Off
In Apache 2.0.x, comment out the configuration directive:
#AddDefaultCharset ISO-8859-1
Note that I have not tested HTTP servers that are not Apache-based; results
from anyone with easy access to IIS or other typical HTTP servers would
help us nicely round out the documentation for InfoCenter mode.
Dan
Dan
Scott/Toronto/IBM@IBMCA To:
platform-help-dev@xxxxxxxxxxx
Sent by: cc:
platform-help-dev-admin@ Subject: Re:
[platform-help-dev] Is UTF-8 encoding assumed for all languages?
eclipse.org
08/05/2003 08:56 AM
Please respond to
platform-help-dev
Hi Konrad:
The files I am viewing do contain the expected <meta HTTP-EQUIV blah>
element specifying the code page.
Another data point to consider: the only time I see corrupted characters is
when I'm viewing the help system through a proxied URL, rather than viewing
the help system directly through the port (e.g.
http://<hostname>:<port>/help/ works fine, but http://<hostname>/infocenter
shows corrupted characters for non-latin1 encodings).
This probably explains why a colleague of mine couldn't reproduce the
problem (and why he thought I was crazy, heh).
I'm running the Eclipse help system on a Linux machine proxied through
Apache 1.3.26. Just noticed that Apache 1.3.27 has been released with the
following bug fix:
<quote>
The following bugs were found in Apache 1.3.26 and have been fixed in
Apache 1.3.27:
mod_proxy fixes:
The cache in mod_proxy was incorrectly updating the
Content-Length value from 304 responses when doing validation.
Fix a problem in proxy where headers from other modules were
added to the response headers when this was already done in the
core already.
</quote>
I wondered whether Apache 1.3.26 was adding a charset header to the
returned document in the proxied help system, so I played with wget asking
for a Russian document (which is encoded in 'win1252'). The wget output is
below; but you can clearly see that in the first case (proxied URL) the web
server is adding a "charset=iso-8859-1" header, which we don't see in the
second case (connecting directly to help system port).
I'll see if I can upgrade to Apache 1.3.27 to reproduce the test (but
hopefully see better test results!). If it turns out that Apache 1.3.27
solves the problem, this will probably be a useful warning to document in
the 'Installing the help system as an infocenter' topic.
wget output:
dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
--08:45:53--
http://daniels.hostname.com/prod/infocenter/topic/com.prod.doc/core/filename.htm
=> `filename.htm.1'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:80... connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Date: Thu, 08 May 2003 12:45:53 GMT
3 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
4 Content-Type: text/html; charset=iso-8859-1
5 Cache-Control: max-age=10000
6 X-Cache: MISS from daniels.hostname.com
7 Connection: close
[ <=>
] 5,581 5.32M/s
08:45:53 (5.32 MB/s) - `filename.htm.1' saved [5581]
dan@daniels:~$ vim filename.htm.1
dan@daniels:~$ wget -S --header='Accept-Language: ru'
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
--08:46:38--
http://daniels.hostname.com:8084/help/topic/com.prod.doc/core/filename.htm
=> `filename.htm.2'
Resolving daniels.hostname.com... done.
Connecting to daniels.hostname.com[9.26.162.217]:8084... connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 200 OK
2 Content-Type: text/html
3 Date: Thu, 08 May 2003 12:46:38 GMT
4 Server: Apache Tomcat/4.0.6 (HTTP/1.1 Connector)
5 Cache-Control: max-age=10000
6 Connection: close
[ <=>
] 5,581 5.32M/s
08:46:38 (5.32 MB/s) - `filename.htm.2' saved [5581]
Dan
--
Dan Scott
Konrad
Kolosowski/Toronto/IBM@I To:
platform-help-dev@xxxxxxxxxxx
BMCA cc:
Sent by: Subject: Re:
[platform-help-dev] Is UTF-8 encoding assumed for all languages?
platform-help-dev-admin@
eclipse.org
07/05/2003 05:45 PM
Please respond to
platform-help-dev
Hi Dan.
There is no assumption on which encoding documents come in. I think your
problem might be that some documents do not specify encoding correctly (for
example, <META HTTP-EQUIV="Content-Type" CONTENT="text/html;
charset=big5">, that Eclipse translation have in the head, for Chinese),
and the browser has to resort to auto detection. The auto detection part
of a particular browser may look at the containing frameset document to
guess.
If the charset is specified as above and you still see the problem, open a
bug against help and we will investigate it.
Konrad Kolosowski
Eclipse Help System
Dan
Scott/Toronto/IBM@IBMCA To:
platform-help-dev@xxxxxxxxxxx
Sent by: cc:
platform-help-dev-admin@ Subject:
[platform-help-dev] Is UTF-8 encoding assumed for all languages?
eclipse.org
05/07/2003 05:13 PM
Please respond to
platform-help-dev
Hi:
I'm experiencing some strangeness with NL content in the help system. I
have Russian documents (navigation and help files) encoded in windows-1251
code page that sometimes display as gibberish.
It looks to me like the frameset document (index.jsp) encoding of UTF-8 is
interfering with the browser's interpretation of the encodings in the
individual frames.
This problem occurs in both Mozilla 1.3.1 and Internet Explorer 6. Is this
a known limitation of the help system or of browsers?
I suppose a workaround would be to convert all of our help content to UTF-8
before generating the doc plugins... yikes.
Dan
--
Dan Scott
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev