Olly Betts | 8 May 2003 18:52
Favicon
Gravatar

Re: Top Terms algorithm

On Thu, May 08, 2003 at 04:30:43PM +0100, orion orion wrote:
> I would like to understand the "Top Terms algorithm".
> 
> Is there somebody who can explain me ?

The algorithm is the standard probabilistic IR query expansion
algorithm, and is described here:

http://www.xapian.org/docs/intro_ir.html

In particular the section "Using the weights: the E set", but you'll
probably need to read the whole document to make sense of this section.

The Robertson/Sparck Jones paper linked to from that page also covers
this - see page 5, "Query expansion".

I don't believe we currently have a non-mathematical explanation.  If
anyone knows of one we can include in the docs, or at least link to, let
me know.

Cheers,
    Olly

-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

Gmane