|
|
|
|
Re: Infocenter and lucene bug [message #719352 is a reply to message #719068] |
Fri, 26 August 2011 19:20 |
dlatch Messages: 7 Registered: August 2011 |
Junior Member |
|
|
Hi Chris,
Thanks for chiming in.
I've found that the analyzer isn't the lucene standardAnalyzer like I thought but one the eclipse help guys appear to have written. In my specific test case I think it is in the english analyzer org.eclipse.help.internal.search.Analyzer_en, specified in the help base plugin.
Looking at the code I found online (I'll download the source jar to make sure), it looks like Analyzer_en calls the LowerCaseAndDigitsTokenizer:
return new PorterStemFilter(new StopFilter(new LowerCaseAndDigitsTokenizer(reader), STOP_WORDS))
The LowerCaseAndDigitsTokenizer looks like it is true to its name:
"Tokenizer breaking words around letters or digits." The code as I read it would break on an underscore.
So my current approach is to try to use the org.eclipse.help.base.luceneAnalyzer extension point and just replace Analyzer_en with my own class based on the same libraries and code that is in helios but my own version of the LowerCaseAndDigitsTokenizer that allows the "_" to be part of a word. The drawback is that I'll probably have to look at doing the same for all of the locales we support fr, es, jp, zh_CN.
What do you think?
[Updated on: Fri, 26 August 2011 19:22] Report message to a moderator
|
|
|
|
|
Re: Infocenter and lucene bug [message #720401 is a reply to message #720117] |
Tue, 30 August 2011 14:08 |
dlatch Messages: 7 Registered: August 2011 |
Junior Member |
|
|
Thanks Chris,
I'd definitely prefer to not ship my own analyzer.
The problem is that when I search for a parameter name, or say an environment variable e.g. index_offset or JAVA_HOME, I get a bunch of hits near the top of the list that don't have any highlighted words in them and the hits I am looking for are scattered fairly far down in the list. If I quote the search, it works correctly. Since this is technical documentation we have many, many such items that people will search on.
Actually I just checked and you can see this in the Eclipse Current Release help documentation. If you search for JAVA_HOME, you get three hits. The first one "Builder Configuration" doesn't appear to have JAVA_HOME in it. The second two do have it and are correct. If you search for "JAVA_HOME", "Builder Configuration" is left off of the list.
Personally I think it is a bug considering eclipse help is largely used for technical documentation. But I don't know if calling it a bug would get much attention, what do you think?
[Updated on: Tue, 30 August 2011 14:53] Report message to a moderator
|
|
|
|
Powered by
FUDForum. Page generated in 0.03639 seconds