Robert Bihlmeyer | 2 Jun 1999 13:26
Picon

Re: babelgnus

Hi,

>>>>> On 01 Jun 1999 18:12:30 +0200
>>>>> Hans de Graaff <graaff <at> xs4all.nl> said:

 Hans> Hmm, determining the language used and putting it in a header
 Hans> would also be useful for the message keyword stuff, as it could
 Hans> then select the proper exclusion stuff.

word-adaptive-scoring also depends on a list of stopwords. True, you
can put words from all langauges you read in there, but "die" is one
example that is highly significant in English, but not at all in
German.

Perhaps the list of "frequent {English,German,French,Suaheli,...}
words" can be unified. I.e. we'd have

(defvar frequent-words
  '((english "the" "a" "one" "for" "that")
    (german "der" "die" "das" "mit" "eine" "einer")
    (lisp "defun" "defvar" "cond")))	

which is used for language detection (the langage with the most hits
is probably correct), and stopword generation.

	Robbe

--

-- 
Robert Bihlmeyer	reads: Deutsch, English, MIME, Latin-1, NO SPAM!
<robbe <at> orcus.priv.at>	<http://stud2.tuwien.ac.at/~e9426626/sig.html>


Gmane