Infocenter and lucene bug [message #718304] |
Tue, 23 August 2011 17:44  |
Eclipse User |
|
|
|
Hi,
I've deployed an Infocenter as a war and run it both under jetty/tomcat/etc and embedded with jetty. We're getting ready to publish our first release but recently discovered during usability testing that the Lucene StandardAnalyzer had a bug whereby tokens were split on the underscore character. I verified this with some quick internet searches and found that it has apparently been fixed in Lucene 3.1. Obviously for technical documentation this is a pretty big issue. I've been working with Helios which has Lucene 1.9 and just checked Indigo and found that it is still only at 2.9.1. It looks to me like somwhere between 2.9.1 and 2.9.4 the Lucene jars became packaged differently (e.g. since 2.9.4 they use lucene.Analyzers instead of lucene.analysis). Is there any way to patch the Infocenter so that it can use Lucene 3.1? Alternatively, since I pre-generate the indexes, can I use the newer indexer to create the indexes and do you think it would still function properly.
regards,
|
|
|
|
|
|
|
|
|
Re: Infocenter and lucene bug [message #720401 is a reply to message #720117] |
Tue, 30 August 2011 10:08   |
Eclipse User |
|
|
|
Thanks Chris,
I'd definitely prefer to not ship my own analyzer.
The problem is that when I search for a parameter name, or say an environment variable e.g. index_offset or JAVA_HOME, I get a bunch of hits near the top of the list that don't have any highlighted words in them and the hits I am looking for are scattered fairly far down in the list. If I quote the search, it works correctly. Since this is technical documentation we have many, many such items that people will search on.
Actually I just checked and you can see this in the Eclipse Current Release help documentation. If you search for JAVA_HOME, you get three hits. The first one "Builder Configuration" doesn't appear to have JAVA_HOME in it. The second two do have it and are correct. If you search for "JAVA_HOME", "Builder Configuration" is left off of the list.
Personally I think it is a bug considering eclipse help is largely used for technical documentation. But I don't know if calling it a bug would get much attention, what do you think?
[Updated on: Tue, 30 August 2011 10:53] by Moderator
|
|
|
Re: Infocenter and lucene bug [message #720590 is a reply to message #720401] |
Tue, 30 August 2011 18:05  |
Eclipse User |
|
|
|
If you take the Bugzilla route the earliest this could get fixed would be Eclipse 3.8, this change would require prebuilt indexes to be generated which would be unacceptable for a point release. I'm guessing that is too far out for your immediate needs.
There is one other idea I thought of which uses a new extension point org.eclipse.help.searchProcessor which was introduced in Eclipse 3.7. This allows you to tweak the query string before a search is performed. This would allow you to change the search terms in any search which contained a term which included underscores.
I agree with you that underscores should not be treated as a break character, this was presumably a design decision made in the early days of the Eclipse help system.
|
|
|
Powered by
FUDForum. Page generated in 0.04020 seconds