Supressing stemming in search [message #474919] |
Thu, 18 December 2008 14:16  |
Eclipse User |
|
|
|
Stemming is causing some search-results problems for us, especially in
languages other than English. Is there an easy way to supress stemming?
Here's an example of the problem: if I search for "License Management
Console" (without the double quotes) in an English Infocenter, strings
such as "licens" and "manag" are highlighted everywhere. This is not too
bad, but somewhat of an annoyance. Searching for the same English string
in a German Infocenter highlights "lic" everywhere, which interferes with
the readability of the document.
I think we may want to try suppressing this feature. Any help would be
appreciated.
Thanks,
Mike
|
|
|
|
Re: Supressing stemming in search [message #474969 is a reply to message #474964] |
Fri, 19 December 2008 16:44  |
Eclipse User |
|
|
|
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.
I'm going to submit the results to our linguists to see if they want the
change in this feature.
Thanks for your help with this, Chris.
Mike
|
|
|
Re: Supressing stemming in search [message #622743 is a reply to message #474919] |
Fri, 19 December 2008 10:21  |
Eclipse User |
|
|
|
I haven't tried this but I think that the following might work.
org.eclipse.help.base has some extensions to define which analyzers are
used.
Look for this section in plugin.xml
<!-- Text Analyzers for search -->
<extension
id="org.eclipse.help.base.Analyzer_en"
point="org.eclipse.help.base.luceneAnalyzer">
<analyzer
locale="en"
class="org.eclipse.help.internal.search.Analyzer_en">
</analyzer>
<analyzer
locale="pt"
class="org.apache.lucene.analysis.br.BrazilianAnalyzer">
</analyzer>
If you remove the entry for "de" then I believe that will turn off
search stemming in German. If you use prebuilt indexes you need to do
this when building the index.
Let me know if this works.
Mike Melton wrote:
> Stemming is causing some search-results problems for us, especially in
> languages other than English. Is there an easy way to supress stemming?
>
> Here's an example of the problem: if I search for "License Management
> Console" (without the double quotes) in an English Infocenter, strings
> such as "licens" and "manag" are highlighted everywhere. This is not too
> bad, but somewhat of an annoyance. Searching for the same English string
> in a German Infocenter highlights "lic" everywhere, which interferes
> with the readability of the document.
>
> I think we may want to try suppressing this feature. Any help would be
> appreciated.
>
> Thanks,
> Mike
>
|
|
|
Re: Supressing stemming in search [message #622754 is a reply to message #474964] |
Fri, 19 December 2008 16:44  |
Eclipse User |
|
|
|
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.
I'm going to submit the results to our linguists to see if they want the
change in this feature.
Thanks for your help with this, Chris.
Mike
|
|
|
Powered by
FUDForum. Page generated in 0.04415 seconds