Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Eclipse Projects » Platform - User Assistance (UA) » Adjust search rankings/scoring?
Adjust search rankings/scoring? [message #469662] Fri, 12 October 2007 17:13 Go to next message
Douglas Dirks is currently offline Douglas Dirks
Messages: 26
Registered: July 2009
Junior Member
Is there any way to tweak the ranking or scoring of XHTML documentation
content for the Search mechanism (Lucene)?

I'm having a hard time understanding how the default page ranking works,
and even less success in influencing it.

Here's an example of what I'd like to do. I've got numerous topics that
contain a given string (e.g. "WIDGET_BASE"), but only one file that
contains this string in the <title> element of the XHTML file. I would
very much like this document to be ranked highest by the search, but in
practice it comes out somewhere down in the middle of the list of hits.

My theory is that the default search is somehow paying more attention to
the length of the file than to the <title> element; files that are short,
and thus have a higher "density" of the search term, seem to receive
higher rankings.

I've tried the old webmaster trick of adding <meta name=keywords ...>, but
this does not seem to affect the rankings.

Is there anything I can do in the content to affect the rankings?

Is there any other way to affect the rankings outside of adding a new
Lucene analyzer extension?

If the only way to do this is to override the default Lucene analyzer,
does anyone have any pointers on doing this, or sample code? I'm not a
Java programmer myself, but I might be able to convince one to help me if
I had something to start from...

Thanks for any help,

Doug Dirks
Re: Adjust search rankings/scoring? [message #469664 is a reply to message #469662] Fri, 12 October 2007 21:55 Go to previous messageGo to next message
Eclipse User
Originally posted by: nospam_kowalskilee.gmail.com

FWIW, there is an open RFE related to having rank boosted when content
match is in <title>; see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648

If that enhancement would have value to you and your company, please log
a vote for it. (Click Vote for this bug, or go to:
https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648

And maybe have other colleagues log a vote for it as well.

Other comments on the description are welcome. Your description below is
precisely the use case for why I opened that RFE in the first place.

Now, it does not approach something like having a feature in PDE for a
developer to set values for a general customization of the algorithms
used to make the rankings for his doc plug-ins. That would be a much
larger enhancement, of course.

--Lee Anne

Doug Dirks wrote:
> Is there any way to tweak the ranking or scoring of XHTML documentation
> content for the Search mechanism (Lucene)?
>
> I'm having a hard time understanding how the default page ranking works,
> and even less success in influencing it.
>
> Here's an example of what I'd like to do. I've got numerous topics that
> contain a given string (e.g. "WIDGET_BASE"), but only one file that
> contains this string in the <title> element of the XHTML file. I would
> very much like this document to be ranked highest by the search, but in
> practice it comes out somewhere down in the middle of the list of hits.
>
> My theory is that the default search is somehow paying more attention to
> the length of the file than to the <title> element; files that are
> short, and thus have a higher "density" of the search term, seem to
> receive higher rankings.
>
> I've tried the old webmaster trick of adding <meta name=keywords ...>,
> but this does not seem to affect the rankings.
>
> Is there anything I can do in the content to affect the rankings?
>
> Is there any other way to affect the rankings outside of adding a new
> Lucene analyzer extension?
>
> If the only way to do this is to override the default Lucene analyzer,
> does anyone have any pointers on doing this, or sample code? I'm not a
> Java programmer myself, but I might be able to convince one to help me
> if I had something to start from...
>
> Thanks for any help,
>
> Doug Dirks
>
Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust search rankings/scoring [message #469883 is a reply to message #469664] Fri, 12 October 2007 22:01 Go to previous messageGo to next message
Eclipse User
Originally posted by: nospam_kowalskilee.gmail.com

My apologies for any unnecessary noise--I had done a quick bugzilla
query and only checked bug 107648. It has a pointer to bug 60773, which
has final comment:
----- Comment #14 From Chris Goldthorpe 2007-06-13 18:44:40 -0400

Using Eclipse 3.3RC4 I can find search hits when the text is present only in
the <title> of an HTML page. Closing as WORKSFORME.

------------------
Doug, what version of Eclipse are you using? Does an experiment in
Eclipse 3.3 show better results for your use case?

It would be nice to know if there are any pointers for ways to override
the default Lucene analyzer, for ways to further enhance the algorithms.

--Lee Anne


Lee Anne wrote:
> FWIW, there is an open RFE related to having rank boosted when content
> match is in <title>; see:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
>
> If that enhancement would have value to you and your company, please log
> a vote for it. (Click Vote for this bug, or go to:
> https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
>
>
> And maybe have other colleagues log a vote for it as well.
>
> Other comments on the description are welcome. Your description below is
> precisely the use case for why I opened that RFE in the first place.
>
> Now, it does not approach something like having a feature in PDE for a
> developer to set values for a general customization of the algorithms
> used to make the rankings for his doc plug-ins. That would be a much
> larger enhancement, of course.
>
> --Lee Anne
>
Re: Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust searc [message #469888 is a reply to message #469883] Sat, 13 October 2007 18:23 Go to previous message
Douglas Dirks is currently offline Douglas Dirks
Messages: 26
Registered: July 2009
Junior Member
We are using Eclipse 3.3.

The value of <title> is clearly included in the search criteria,
but an exact match in the <title> is still not enough to push
the topic to the top of the results list (or even very close).

A very high relevance rank for the <title> would go a long way
toward solving the problem. It would also be very useful if there
were some other way for HTML authors to influence the ranking.

I went and voted for the bug you mentioned, Lee Anne, and added
some of my own thoughts as a content author.

Thanks,
Doug
Re: Adjust search rankings/scoring? [message #586301 is a reply to message #469662] Fri, 12 October 2007 21:55 Go to previous message
Eclipse User
Originally posted by: nospam_kowalskilee.gmail.com

FWIW, there is an open RFE related to having rank boosted when content
match is in <title>; see:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648

If that enhancement would have value to you and your company, please log
a vote for it. (Click Vote for this bug, or go to:
https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648

And maybe have other colleagues log a vote for it as well.

Other comments on the description are welcome. Your description below is
precisely the use case for why I opened that RFE in the first place.

Now, it does not approach something like having a feature in PDE for a
developer to set values for a general customization of the algorithms
used to make the rankings for his doc plug-ins. That would be a much
larger enhancement, of course.

--Lee Anne

Doug Dirks wrote:
> Is there any way to tweak the ranking or scoring of XHTML documentation
> content for the Search mechanism (Lucene)?
>
> I'm having a hard time understanding how the default page ranking works,
> and even less success in influencing it.
>
> Here's an example of what I'd like to do. I've got numerous topics that
> contain a given string (e.g. "WIDGET_BASE"), but only one file that
> contains this string in the <title> element of the XHTML file. I would
> very much like this document to be ranked highest by the search, but in
> practice it comes out somewhere down in the middle of the list of hits.
>
> My theory is that the default search is somehow paying more attention to
> the length of the file than to the <title> element; files that are
> short, and thus have a higher "density" of the search term, seem to
> receive higher rankings.
>
> I've tried the old webmaster trick of adding <meta name=keywords ...>,
> but this does not seem to affect the rankings.
>
> Is there anything I can do in the content to affect the rankings?
>
> Is there any other way to affect the rankings outside of adding a new
> Lucene analyzer extension?
>
> If the only way to do this is to override the default Lucene analyzer,
> does anyone have any pointers on doing this, or sample code? I'm not a
> Java programmer myself, but I might be able to convince one to help me
> if I had something to start from...
>
> Thanks for any help,
>
> Doug Dirks
>
Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust search rankings/scoring [message #588583 is a reply to message #469664] Fri, 12 October 2007 22:01 Go to previous message
Eclipse User
Originally posted by: nospam_kowalskilee.gmail.com

My apologies for any unnecessary noise--I had done a quick bugzilla
query and only checked bug 107648. It has a pointer to bug 60773, which
has final comment:
----- Comment #14 From Chris Goldthorpe 2007-06-13 18:44:40 -0400

Using Eclipse 3.3RC4 I can find search hits when the text is present only in
the <title> of an HTML page. Closing as WORKSFORME.

------------------
Doug, what version of Eclipse are you using? Does an experiment in
Eclipse 3.3 show better results for your use case?

It would be nice to know if there are any pointers for ways to override
the default Lucene analyzer, for ways to further enhance the algorithms.

--Lee Anne


Lee Anne wrote:
> FWIW, there is an open RFE related to having rank boosted when content
> match is in <title>; see:
> https://bugs.eclipse.org/bugs/show_bug.cgi?id=107648
>
> If that enhancement would have value to you and your company, please log
> a vote for it. (Click Vote for this bug, or go to:
> https://bugs.eclipse.org/bugs/votes.cgi?action=show_user& ;bug_id=107648#vote_107648
>
>
> And maybe have other colleagues log a vote for it as well.
>
> Other comments on the description are welcome. Your description below is
> precisely the use case for why I opened that RFE in the first place.
>
> Now, it does not approach something like having a feature in PDE for a
> developer to set values for a general customization of the algorithms
> used to make the rankings for his doc plug-ins. That would be a much
> larger enhancement, of course.
>
> --Lee Anne
>
Re: Maybe boosting of <title> text working in Eclipse 3.3? (was Re: Adjust searc [message #588598 is a reply to message #469883] Sat, 13 October 2007 18:23 Go to previous message
Eclipse User
Originally posted by: ddirks.ittvis.com

We are using Eclipse 3.3.

The value of <title> is clearly included in the search criteria,
but an exact match in the <title> is still not enough to push
the topic to the top of the results list (or even very close).

A very high relevance rank for the <title> would go a long way
toward solving the problem. It would also be very useful if there
were some other way for HTML authors to influence the ranking.

I went and voted for the bug you mentioned, Lee Anne, and added
some of my own thoughts as a content author.

Thanks,
Doug
Previous Topic:Required Plugins?
Next Topic:Consider creating a wiki page for UA?
Goto Forum:
  


Current Time: Thu Oct 02 10:33:48 GMT 2014

Powered by FUDForum. Page generated in 0.02793 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software