Home » Eclipse Projects » Platform - User Assistance (UA) » Adjust search rankings/scoring?
Adjust search rankings/scoring? [message #469662] |
Fri, 12 October 2007 17:13 |
Douglas Dirks Messages: 26 Registered: July 2009 |
Junior Member |
|
|
Is there any way to tweak the ranking or scoring of XHTML documentation
content for the Search mechanism (Lucene)?
I'm having a hard time understanding how the default page ranking works,
and even less success in influencing it.
Here's an example of what I'd like to do. I've got numerous topics that
contain a given string (e.g. "WIDGET_BASE"), but only one file that
contains this string in the <title> element of the XHTML file. I would
very much like this document to be ranked highest by the search, but in
practice it comes out somewhere down in the middle of the list of hits.
My theory is that the default search is somehow paying more attention to
the length of the file than to the <title> element; files that are short,
and thus have a higher "density" of the search term, seem to receive
higher rankings.
I've tried the old webmaster trick of adding <meta name=keywords ...>, but
this does not seem to affect the rankings.
Is there anything I can do in the content to affect the rankings?
Is there any other way to affect the rankings outside of adding a new
Lucene analyzer extension?
If the only way to do this is to override the default Lucene analyzer,
does anyone have any pointers on doing this, or sample code? I'm not a
Java programmer myself, but I might be able to convince one to help me if
I had something to start from...
Thanks for any help,
Doug Dirks
|
|
|
Re: Adjust search rankings/scoring? [message #469664 is a reply to message #469662] |
Fri, 12 October 2007 21:55 |
Eclipse User |
|
|
|
Originally posted by: nospam_kowalskilee.gmail.com
FWIW, there is an open RFE related to having rank boosted when content
match is in <title>; see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
If that enhancement would have value to you and your company, please log
a vote for it. (Click Vote for this bug, or go to:
https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
And maybe have other colleagues log a vote for it as well.
Other comments on the description are welcome. Your description below is
precisely the use case for why I opened that RFE in the first place.
Now, it does not approach something like having a feature in PDE for a
developer to set values for a general customization of the algorithms
used to make the rankings for his doc plug-ins. That would be a much
larger enhancement, of course.
--Lee Anne
Doug Dirks wrote:
> Is there any way to tweak the ranking or scoring of XHTML documentation
> content for the Search mechanism (Lucene)?
>
> I'm having a hard time understanding how the default page ranking works,
> and even less success in influencing it.
>
> Here's an example of what I'd like to do. I've got numerous topics that
> contain a given string (e.g. "WIDGET_BASE"), but only one file that
> contains this string in the <title> element of the XHTML file. I would
> very much like this document to be ranked highest by the search, but in
> practice it comes out somewhere down in the middle of the list of hits.
>
> My theory is that the default search is somehow paying more attention to
> the length of the file than to the <title> element; files that are
> short, and thus have a higher "density" of the search term, seem to
> receive higher rankings.
>
> I've tried the old webmaster trick of adding <meta name=keywords ...>,
> but this does not seem to affect the rankings.
>
> Is there anything I can do in the content to affect the rankings?
>
> Is there any other way to affect the rankings outside of adding a new
> Lucene analyzer extension?
>
> If the only way to do this is to override the default Lucene analyzer,
> does anyone have any pointers on doing this, or sample code? I'm not a
> Java programmer myself, but I might be able to convince one to help me
> if I had something to start from...
>
> Thanks for any help,
>
> Doug Dirks
>
|
|
|
Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust search rankings/scoring [message #469883 is a reply to message #469664] |
Fri, 12 October 2007 22:01 |
Eclipse User |
|
|
|
Originally posted by: nospam_kowalskilee.gmail.com
My apologies for any unnecessary noise--I had done a quick bugzilla
query and only checked bug 107648. It has a pointer to bug 60773, which
has final comment:
----- Comment #14 From Chris Goldthorpe 2007-06-13 18:44:40 -0400
Using Eclipse 3.3RC4 I can find search hits when the text is present only in
the <title> of an HTML page. Closing as WORKSFORME.
------------------
Doug, what version of Eclipse are you using? Does an experiment in
Eclipse 3.3 show better results for your use case?
It would be nice to know if there are any pointers for ways to override
the default Lucene analyzer, for ways to further enhance the algorithms.
--Lee Anne
Lee Anne wrote:
> FWIW, there is an open RFE related to having rank boosted when content
> match is in <title>; see:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
>
> If that enhancement would have value to you and your company, please log
> a vote for it. (Click Vote for this bug, or go to:
> https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
>
>
> And maybe have other colleagues log a vote for it as well.
>
> Other comments on the description are welcome. Your description below is
> precisely the use case for why I opened that RFE in the first place.
>
> Now, it does not approach something like having a feature in PDE for a
> developer to set values for a general customization of the algorithms
> used to make the rankings for his doc plug-ins. That would be a much
> larger enhancement, of course.
>
> --Lee Anne
>
|
|
| |
Re: Adjust search rankings/scoring? [message #586301 is a reply to message #469662] |
Fri, 12 October 2007 21:55 |
Eclipse User |
|
|
|
Originally posted by: nospam_kowalskilee.gmail.com
FWIW, there is an open RFE related to having rank boosted when content
match is in <title>; see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
If that enhancement would have value to you and your company, please log
a vote for it. (Click Vote for this bug, or go to:
https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
And maybe have other colleagues log a vote for it as well.
Other comments on the description are welcome. Your description below is
precisely the use case for why I opened that RFE in the first place.
Now, it does not approach something like having a feature in PDE for a
developer to set values for a general customization of the algorithms
used to make the rankings for his doc plug-ins. That would be a much
larger enhancement, of course.
--Lee Anne
Doug Dirks wrote:
> Is there any way to tweak the ranking or scoring of XHTML documentation
> content for the Search mechanism (Lucene)?
>
> I'm having a hard time understanding how the default page ranking works,
> and even less success in influencing it.
>
> Here's an example of what I'd like to do. I've got numerous topics that
> contain a given string (e.g. "WIDGET_BASE"), but only one file that
> contains this string in the <title> element of the XHTML file. I would
> very much like this document to be ranked highest by the search, but in
> practice it comes out somewhere down in the middle of the list of hits.
>
> My theory is that the default search is somehow paying more attention to
> the length of the file than to the <title> element; files that are
> short, and thus have a higher "density" of the search term, seem to
> receive higher rankings.
>
> I've tried the old webmaster trick of adding <meta name=keywords ...>,
> but this does not seem to affect the rankings.
>
> Is there anything I can do in the content to affect the rankings?
>
> Is there any other way to affect the rankings outside of adding a new
> Lucene analyzer extension?
>
> If the only way to do this is to override the default Lucene analyzer,
> does anyone have any pointers on doing this, or sample code? I'm not a
> Java programmer myself, but I might be able to convince one to help me
> if I had something to start from...
>
> Thanks for any help,
>
> Doug Dirks
>
|
|
|
Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust search rankings/scoring [message #588583 is a reply to message #469664] |
Fri, 12 October 2007 22:01 |
Eclipse User |
|
|
|
Originally posted by: nospam_kowalskilee.gmail.com
My apologies for any unnecessary noise--I had done a quick bugzilla
query and only checked bug 107648. It has a pointer to bug 60773, which
has final comment:
----- Comment #14 From Chris Goldthorpe 2007-06-13 18:44:40 -0400
Using Eclipse 3.3RC4 I can find search hits when the text is present only in
the <title> of an HTML page. Closing as WORKSFORME.
------------------
Doug, what version of Eclipse are you using? Does an experiment in
Eclipse 3.3 show better results for your use case?
It would be nice to know if there are any pointers for ways to override
the default Lucene analyzer, for ways to further enhance the algorithms.
--Lee Anne
Lee Anne wrote:
> FWIW, there is an open RFE related to having rank boosted when content
> match is in <title>; see:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
>
> If that enhancement would have value to you and your company, please log
> a vote for it. (Click Vote for this bug, or go to:
> https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
>
>
> And maybe have other colleagues log a vote for it as well.
>
> Other comments on the description are welcome. Your description below is
> precisely the use case for why I opened that RFE in the first place.
>
> Now, it does not approach something like having a feature in PDE for a
> developer to set values for a general customization of the algorithms
> used to make the rankings for his doc plug-ins. That would be a much
> larger enhancement, of course.
>
> --Lee Anne
>
|
|
|
Re: Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust searc [message #588598 is a reply to message #469883] |
Sat, 13 October 2007 18:23 |
Eclipse User |
|
|
|
Originally posted by: ddirks.ittvis.com
We are using Eclipse 3.3.
The value of <title> is clearly included in the search criteria,
but an exact match in the <title> is still not enough to push
the topic to the top of the results list (or even very close).
A very high relevance rank for the <title> would go a long way
toward solving the problem. It would also be very useful if there
were some other way for HTML authors to influence the ranking.
I went and voted for the bug you mentioned, Lee Anne, and added
some of my own thoughts as a content author.
Thanks,
Doug
|
|
|
Goto Forum:
Current Time: Fri Apr 26 03:22:00 GMT 2024
Powered by FUDForum. Page generated in 0.04055 seconds
|