HtmlToText Pipelet and encoding [message #671208] |
Tue, 17 May 2011 06:53 |
No real name Messages: 3 Registered: May 2011 |
Junior Member |
|
|
Hi there,
yesterday i ran into troubles with character encoding on the Html2Text pipelet. Especially with German characters like "Ä", "Ö", "ß", that sort of stuff.
The only way to change the encoding was to checkout the source and add the following line to the NekoHTML parser (line 286 in the HtmlToTextPipelet):
parser.setProperty("http: cyberneko.org/html/properties/default-encoding", "UTF-8");
I removed the "//" from the above string, otherwise i couldn´t post, because of the 5 Posts limitation thing
Is there any other way to do this? Maybe over the configuration files that i haven´t found yet?
Greets
Jan
[Updated on: Tue, 17 May 2011 06:54] Report message to a moderator
|
|
|
Re: HtmlToText Pipelet and encoding [message #671247 is a reply to message #671208] |
Tue, 17 May 2011 08:36 |
Eclipse User |
|
|
|
Originally posted by:
Hi Jan,
Thanks for this report.
Am 17.05.2011, 08:53 Uhr, schrieb <forums-noreply@eclipse.org>:
> Hi there,
>
> yesterday i ran into troubles with character encoding on the
> Html2Text pipelet. Especially with German characters like
> "Ä", "Ö", "ß", that sort of stuff.
> The only way to change the encoding was to checkout the
> source and add the following line to the NekoHTML parser
> (line 286 in the HtmlToTextPipelet):
>
> parser.setProperty("http:
> cyberneko.org/html/properties/default-encoding", "UTF-8");
>
> Is there any other way to do this? Maybe over the
> configuration files that i haven´t found yet?
No, there are no other configuration files for the HtmlToTextPipelet.
All configuration for this pipelet is done in BPEL.
I've added a configuration option "defaultEncoding" to the pipelet, see
[1].
It applies if no the HTML document does not contain a encoding
specification (e.g.
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />.
I've also added
some test cases (see [2]). It seems that NekoHTML uses an ISO-8859-?
encoding, if the
document does not contain such a meta tag, so you can set the parameter to
override this.
Cheers,
Jürgen.
[1]
http://wiki.eclipse.org/SMILA/Documentation/Bundle_org.eclip se.smila.processing.pipelets#org.eclipse.smila.processing.pi pelets.HtmlToTextPipelet
[2]
https://dev.eclipse.org/svnroot/rt/org.eclipse.smila/trunk/c ore/org.eclipse.smila.processing.pipelets.test/code/src/org/ eclipse/smila/processing/pipelets/test/TestHtmlToTextPipelet .java
|
|
|
Powered by
FUDForum. Page generated in 0.02357 seconds