Skip to main content



      Home
Home » Eclipse Projects » Platform - User Assistance (UA) » Supressing stemming in search
Supressing stemming in search [message #474919] Thu, 18 December 2008 14:16 Go to next message
Eclipse UserFriend
Stemming is causing some search-results problems for us, especially in
languages other than English. Is there an easy way to supress stemming?

Here's an example of the problem: if I search for "License Management
Console" (without the double quotes) in an English Infocenter, strings
such as "licens" and "manag" are highlighted everywhere. This is not too
bad, but somewhat of an annoyance. Searching for the same English string
in a German Infocenter highlights "lic" everywhere, which interferes with
the readability of the document.

I think we may want to try suppressing this feature. Any help would be
appreciated.

Thanks,
Mike
Re: Supressing stemming in search [message #474964 is a reply to message #474919] Fri, 19 December 2008 10:21 Go to previous messageGo to next message
Eclipse UserFriend
I haven't tried this but I think that the following might work.
org.eclipse.help.base has some extensions to define which analyzers are
used.

Look for this section in plugin.xml

<!-- Text Analyzers for search -->
<extension
id="org.eclipse.help.base.Analyzer_en"
point="org.eclipse.help.base.luceneAnalyzer">
<analyzer
locale="en"
class="org.eclipse.help.internal.search.Analyzer_en">
</analyzer>
<analyzer
locale="pt"
class="org.apache.lucene.analysis.br.BrazilianAnalyzer">
</analyzer>

If you remove the entry for "de" then I believe that will turn off
search stemming in German. If you use prebuilt indexes you need to do
this when building the index.

Let me know if this works.

Mike Melton wrote:
> Stemming is causing some search-results problems for us, especially in
> languages other than English. Is there an easy way to supress stemming?
>
> Here's an example of the problem: if I search for "License Management
> Console" (without the double quotes) in an English Infocenter, strings
> such as "licens" and "manag" are highlighted everywhere. This is not too
> bad, but somewhat of an annoyance. Searching for the same English string
> in a German Infocenter highlights "lic" everywhere, which interferes
> with the readability of the document.
>
> I think we may want to try suppressing this feature. Any help would be
> appreciated.
>
> Thanks,
> Mike
>
Re: Supressing stemming in search [message #474969 is a reply to message #474964] Fri, 19 December 2008 16:44 Go to previous message
Eclipse UserFriend
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.

I'm going to submit the results to our linguists to see if they want the
change in this feature.

Thanks for your help with this, Chris.

Mike
Re: Supressing stemming in search [message #622743 is a reply to message #474919] Fri, 19 December 2008 10:21 Go to previous message
Eclipse UserFriend
I haven't tried this but I think that the following might work.
org.eclipse.help.base has some extensions to define which analyzers are
used.

Look for this section in plugin.xml

<!-- Text Analyzers for search -->
<extension
id="org.eclipse.help.base.Analyzer_en"
point="org.eclipse.help.base.luceneAnalyzer">
<analyzer
locale="en"
class="org.eclipse.help.internal.search.Analyzer_en">
</analyzer>
<analyzer
locale="pt"
class="org.apache.lucene.analysis.br.BrazilianAnalyzer">
</analyzer>

If you remove the entry for "de" then I believe that will turn off
search stemming in German. If you use prebuilt indexes you need to do
this when building the index.

Let me know if this works.

Mike Melton wrote:
> Stemming is causing some search-results problems for us, especially in
> languages other than English. Is there an easy way to supress stemming?
>
> Here's an example of the problem: if I search for "License Management
> Console" (without the double quotes) in an English Infocenter, strings
> such as "licens" and "manag" are highlighted everywhere. This is not too
> bad, but somewhat of an annoyance. Searching for the same English string
> in a German Infocenter highlights "lic" everywhere, which interferes
> with the readability of the document.
>
> I think we may want to try suppressing this feature. Any help would be
> appreciated.
>
> Thanks,
> Mike
>
Re: Supressing stemming in search [message #622754 is a reply to message #474964] Fri, 19 December 2008 16:44 Go to previous message
Eclipse UserFriend
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.

I'm going to submit the results to our linguists to see if they want the
change in this feature.

Thanks for your help with this, Chris.

Mike
Previous Topic:can the help search index be rebuilt without deleting the index directory AND restarting the workben
Next Topic:Infocenter: enabling filtering
Goto Forum:
  


Current Time: Sat May 17 00:05:31 EDT 2025

Powered by FUDForum. Page generated in 0.04415 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top