Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Platform - User Assistance (UA) » Supressing stemming in search
Supressing stemming in search [message #474919] Thu, 18 December 2008 19:16 Go to next message
Mike Melton is currently offline Mike Melton
Messages: 50
Registered: July 2009
Member
Stemming is causing some search-results problems for us, especially in
languages other than English. Is there an easy way to supress stemming?

Here's an example of the problem: if I search for "License Management
Console" (without the double quotes) in an English Infocenter, strings
such as "licens" and "manag" are highlighted everywhere. This is not too
bad, but somewhat of an annoyance. Searching for the same English string
in a German Infocenter highlights "lic" everywhere, which interferes with
the readability of the document.

I think we may want to try suppressing this feature. Any help would be
appreciated.

Thanks,
Mike
Re: Supressing stemming in search [message #474964 is a reply to message #474919] Fri, 19 December 2008 15:21 Go to previous messageGo to next message
Chris Goldthorpe is currently offline Chris Goldthorpe
Messages: 815
Registered: July 2009
Senior Member
I haven't tried this but I think that the following might work.
org.eclipse.help.base has some extensions to define which analyzers are
used.

Look for this section in plugin.xml

<!-- Text Analyzers for search -->
<extension
id="org.eclipse.help.base.Analyzer_en"
point="org.eclipse.help.base.luceneAnalyzer">
<analyzer
locale="en"
class="org.eclipse.help.internal.search.Analyzer_en">
</analyzer>
<analyzer
locale="pt"
class="org.apache.lucene.analysis.br.BrazilianAnalyzer">
</analyzer>

If you remove the entry for "de" then I believe that will turn off
search stemming in German. If you use prebuilt indexes you need to do
this when building the index.

Let me know if this works.

Mike Melton wrote:
> Stemming is causing some search-results problems for us, especially in
> languages other than English. Is there an easy way to supress stemming?
>
> Here's an example of the problem: if I search for "License Management
> Console" (without the double quotes) in an English Infocenter, strings
> such as "licens" and "manag" are highlighted everywhere. This is not too
> bad, but somewhat of an annoyance. Searching for the same English string
> in a German Infocenter highlights "lic" everywhere, which interferes
> with the readability of the document.
>
> I think we may want to try suppressing this feature. Any help would be
> appreciated.
>
> Thanks,
> Mike
>
Re: Supressing stemming in search [message #474969 is a reply to message #474964] Fri, 19 December 2008 21:44 Go to previous message
Mike Melton is currently offline Mike Melton
Messages: 50
Registered: July 2009
Member
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.

I'm going to submit the results to our linguists to see if they want the
change in this feature.

Thanks for your help with this, Chris.

Mike
Re: Supressing stemming in search [message #622743 is a reply to message #474919] Fri, 19 December 2008 15:21 Go to previous message
Chris Goldthorpe is currently offline Chris Goldthorpe
Messages: 815
Registered: July 2009
Senior Member
I haven't tried this but I think that the following might work.
org.eclipse.help.base has some extensions to define which analyzers are
used.

Look for this section in plugin.xml

<!-- Text Analyzers for search -->
<extension
id="org.eclipse.help.base.Analyzer_en"
point="org.eclipse.help.base.luceneAnalyzer">
<analyzer
locale="en"
class="org.eclipse.help.internal.search.Analyzer_en">
</analyzer>
<analyzer
locale="pt"
class="org.apache.lucene.analysis.br.BrazilianAnalyzer">
</analyzer>

If you remove the entry for "de" then I believe that will turn off
search stemming in German. If you use prebuilt indexes you need to do
this when building the index.

Let me know if this works.

Mike Melton wrote:
> Stemming is causing some search-results problems for us, especially in
> languages other than English. Is there an easy way to supress stemming?
>
> Here's an example of the problem: if I search for "License Management
> Console" (without the double quotes) in an English Infocenter, strings
> such as "licens" and "manag" are highlighted everywhere. This is not too
> bad, but somewhat of an annoyance. Searching for the same English string
> in a German Infocenter highlights "lic" everywhere, which interferes
> with the readability of the document.
>
> I think we may want to try suppressing this feature. Any help would be
> appreciated.
>
> Thanks,
> Mike
>
Re: Supressing stemming in search [message #622754 is a reply to message #474964] Fri, 19 December 2008 21:44 Go to previous message
Mike Melton is currently offline Mike Melton
Messages: 50
Registered: July 2009
Member
After removing the analyzer code for DE from
org.eclipse.help.base/plugin.xml, searching German documents now finds
whole words instead of stemming.

I'm going to submit the results to our linguists to see if they want the
change in this feature.

Thanks for your help with this, Chris.

Mike
Previous Topic:Infocenter: enabling filtering
Next Topic:Cheat sheet item: how to have variables in the description tag
Goto Forum:
  


Current Time: Sat Sep 20 20:11:19 GMT 2014

Powered by FUDForum. Page generated in 0.01873 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software