Philip Neustrom | 15 Jan 13:57
Picon
Gravatar

Re: Spelling based on frequency and not just distance

The patch attached to this email is better than the previous.  Hopefully
somebody can come up with something better entirely, as I'm not totally
happy with what I have -- it tends to suggest things like "plant" for
"plants" and then "plan" for "plant" :)

--Philip

On Jan 15, 2008 1:24 AM, Philip Neustrom < philipn <at> gmail.com> wrote:

> Hey all,
>
> After implementing the new spelling functionality on http://wikispot.org I
> noticed that terms like "wikipeda" weren't yielding spelling suggestions.
> Taking a quick look at the code, it looks like if we find an exact match,
> even if it has a frequency less than another match within the provided
> delta, we don't suggest anything.  This is probably fine for sites with
> documents where you can be assured the data is properly spelled -- but not
> suitable for something like a wiki or the web in general.
>
> I did something simple, attached in a patch.  Maybe someone has a better
> idea of how to weigh the different options, but my quick fix seemed to give
> much better results than the "give up on exact or edit-distance-closest
> match" code that was there already.
>
> --Philip Neustrom
>
Attachment (spelling_frequency.diff): text/x-diff, 2639 bytes
_______________________________________________
Xapian-discuss mailing list
Xapian-discuss <at> lists.xapian.org
http://lists.xapian.org/mailman/listinfo/xapian-discuss

Gmane