8 May 2003 18:52
Re: Top Terms algorithm
Olly Betts <olly <at> survex.com>
2003-05-08 16:52:36 GMT
2003-05-08 16:52:36 GMT
On Thu, May 08, 2003 at 04:30:43PM +0100, orion orion wrote: > I would like to understand the "Top Terms algorithm". > > Is there somebody who can explain me ? The algorithm is the standard probabilistic IR query expansion algorithm, and is described here: http://www.xapian.org/docs/intro_ir.html In particular the section "Using the weights: the E set", but you'll probably need to read the whole document to make sense of this section. The Robertson/Sparck Jones paper linked to from that page also covers this - see page 5, "Query expansion". I don't believe we currently have a non-mathematical explanation. If anyone knows of one we can include in the docs, or at least link to, let me know. Cheers, Olly ------------------------------------------------------- Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara The only event dedicated to issues related to Linux enterprise solutions www.enterpriselinuxforum.com
RSS Feed