Hi there,
yesterday i ran into troubles with character encoding on the Html2Text pipelet. Especially with German characters like "Ä", "Ö", "ß", that sort of stuff.
The only way to change the encoding was to checkout the source and add the following line to the NekoHTML parser (line 286 in the HtmlToTextPipelet):
parser.setProperty("http: cyberneko.org/html/properties/default-encoding", "UTF-8");
I removed the "//" from the above string, otherwise i couldn´t post, because of the 5 Posts limitation thing
Is there any other way to do this? Maybe over the configuration files that i haven´t found yet?
Greets
Jan
[Updated on: Tue, 17 May 2011 02:54]
Report message to a moderator