Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » SeMantic Information Logistics Architecture (SMILA) » HtmlToText Pipelet and encoding
HtmlToText Pipelet and encoding [message #671208] Tue, 17 May 2011 06:53 Go to next message
No real name is currently offline No real name
Messages: 3
Registered: May 2011
Junior Member
Hi there,

yesterday i ran into troubles with character encoding on the Html2Text pipelet. Especially with German characters like "Ä", "Ö", "ß", that sort of stuff.
The only way to change the encoding was to checkout the source and add the following line to the NekoHTML parser (line 286 in the HtmlToTextPipelet):

parser.setProperty("http: cyberneko.org/html/properties/default-encoding", "UTF-8");


I removed the "//" from the above string, otherwise i couldn´t post, because of the 5 Posts limitation thing Shocked

Is there any other way to do this? Maybe over the configuration files that i haven´t found yet?

Greets
Jan

[Updated on: Tue, 17 May 2011 06:54]

Report message to a moderator

Re: HtmlToText Pipelet and encoding [message #671247 is a reply to message #671208] Tue, 17 May 2011 08:36 Go to previous message
Eclipse User
Originally posted by:

Hi Jan,

Thanks for this report.

Am 17.05.2011, 08:53 Uhr, schrieb <forums-noreply@eclipse.org>:
> Hi there,
>
> yesterday i ran into troubles with character encoding on the
> Html2Text pipelet. Especially with German characters like
> "Ä", "Ö", "ß", that sort of stuff.
> The only way to change the encoding was to checkout the
> source and add the following line to the NekoHTML parser
> (line 286 in the HtmlToTextPipelet):
>
> parser.setProperty("http:
> cyberneko.org/html/properties/default-encoding", "UTF-8");
>
> Is there any other way to do this? Maybe over the
> configuration files that i haven´t found yet?

No, there are no other configuration files for the HtmlToTextPipelet.
All configuration for this pipelet is done in BPEL.

I've added a configuration option "defaultEncoding" to the pipelet, see
[1].
It applies if no the HTML document does not contain a encoding
specification (e.g.
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />.
I've also added
some test cases (see [2]). It seems that NekoHTML uses an ISO-8859-?
encoding, if the
document does not contain such a meta tag, so you can set the parameter to
override this.

Cheers,
Jürgen.

[1]
http://wiki.eclipse.org/SMILA/Documentation/Bundle_org.eclip se.smila.processing.pipelets#org.eclipse.smila.processing.pi pelets.HtmlToTextPipelet

[2]
https://dev.eclipse.org/svnroot/rt/org.eclipse.smila/trunk/c ore/org.eclipse.smila.processing.pipelets.test/code/src/org/ eclipse/smila/processing/pipelets/test/TestHtmlToTextPipelet .java
Previous Topic:ClassLoader problem
Next Topic:Another ClassLoader Problem
Goto Forum:
  


Current Time: Wed Oct 01 00:11:09 GMT 2014

Powered by FUDForum. Page generated in 0.02040 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software