Can Infocenter use a search engine that supports Hebrew? [message #475431] |
Wed, 25 March 2009 14:09  |
Eclipse User |
|
|
|
So far as I can tell, Infocenter's search engine is Lucene, and Lucene
doesn't support Hebrew.
Can I tell Infocenter to use a search engine that can find Hebrew text?
I'd be happy to give up Lucene's sophisticated seach in order to be able
to find simple Hebrew text.
Thanks,
Eli
|
|
|
Re: Can Infocenter use a search engine that supports Hebrew? [message #475437 is a reply to message #475431] |
Mon, 30 March 2009 17:30   |
Eclipse User |
|
|
|
Currently the help system has analyzers for Brazilian, Chinese, Czech,
German, Greek, French, Dutch, Russian and English. What this means is
that the search understands something about how words are constructed in
different languages and how to recognize endings such as 's' for plural,
"ed" for past participle etc.
For every language other the search is less sophisticated, and search
will require an exact match to be in the text, and will only recognize a
limited number of separator characters.
I have no idea how well or badly this works in Israeli, since you posted
this I'm guessing not so well. You may want to check out to see if
Lucene has an Israeli analyzer, if so it would not be difficult to hook
it into the current mechanism, it is done using extension points.
Eli Lato wrote:
> So far as I can tell, Infocenter's search engine is Lucene, and Lucene
> doesn't support Hebrew.
>
> Can I tell Infocenter to use a search engine that can find Hebrew text?
> I'd be happy to give up Lucene's sophisticated seach in order to be able
> to find simple Hebrew text.
>
> Thanks,
> Eli
>
|
|
|
|
|
|
|
|
|
Re: Can Infocenter use a search engine that supports Hebrew? [message #623268 is a reply to message #475431] |
Mon, 30 March 2009 17:30  |
Eclipse User |
|
|
|
Currently the help system has analyzers for Brazilian, Chinese, Czech,
German, Greek, French, Dutch, Russian and English. What this means is
that the search understands something about how words are constructed in
different languages and how to recognize endings such as 's' for plural,
"ed" for past participle etc.
For every language other the search is less sophisticated, and search
will require an exact match to be in the text, and will only recognize a
limited number of separator characters.
I have no idea how well or badly this works in Israeli, since you posted
this I'm guessing not so well. You may want to check out to see if
Lucene has an Israeli analyzer, if so it would not be difficult to hook
it into the current mechanism, it is done using extension points.
Eli Lato wrote:
> So far as I can tell, Infocenter's search engine is Lucene, and Lucene
> doesn't support Hebrew.
>
> Can I tell Infocenter to use a search engine that can find Hebrew text?
> I'd be happy to give up Lucene's sophisticated seach in order to be able
> to find simple Hebrew text.
>
> Thanks,
> Eli
>
|
|
|
|
|
Re: Can Infocenter use a search engine that supports Hebrew? [message #623319 is a reply to message #475469] |
Mon, 06 April 2009 18:30  |
Eclipse User |
|
|
|
You shouldn't need to do anything on the server side to handle non ASCII
characters as long as the html or xhtml document specifies the charset
used. Is the problem only with non-ascii documents or do you see it in
all languages?
Gerardo Laster wrote:
> Hi,
> We have a similar issue with extended character support. The search
> engine works on infocenters running on Windows, but does not work on
> Solaris.
> Are there any server settings that need to be modified so the search
> engine detects Japanese and extended characters?
> Thanks
> Gerardo
>
|
|
|
Re: Can Infocenter use a search engine that supports Hebrew? [message #623321 is a reply to message #475471] |
Tue, 07 April 2009 12:22  |
Eclipse User |
|
|
|
Hi Chris,
The way the search "works" right now is that if you search for word with
no extended characters it shows results, however if you search for a word
with extended characters it does not show any results. Of course with JA,
KO etc, the problem is critical as all the characters are non-ascii.
Our content is defined as
<html xmlns="http://www.w3.org/1999/xhtml" lang="ja-jp" xml:lang="ja-jp">
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
And this only affects the search feature.
Thanks,
I am going to check to the info on the other thread.
|
|
|
Re: Can Infocenter use a search engine that supports Hebrew? [message #623326 is a reply to message #475473] |
Thu, 09 April 2009 17:20  |
Eclipse User |
|
|
|
Can someone file a bug report on this with a simple test plug-in
containing documentation and the exact steps you took (i.e. did you
search from the help view or from the help browser)? I have not heard of
this problem before.
Chris
Gerardo Laster wrote:
> Hi Chris,
> The way the search "works" right now is that if you search for word with
> no extended characters it shows results, however if you search for a
> word with extended characters it does not show any results. Of course
> with JA, KO etc, the problem is critical as all the characters are
> non-ascii.
> Our content is defined as <html xmlns="http://www.w3.org/1999/xhtml"
> lang="ja-jp" xml:lang="ja-jp">
> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
>
> And this only affects the search feature.
> Thanks,
> I am going to check to the info on the other thread.
>
>
|
|
|
|
Powered by
FUDForum. Page generated in 0.09576 seconds