2 Jun 1999 13:26
Re: babelgnus
Robert Bihlmeyer <e9426626 <at> stud2.tuwien.ac.at>
1999-06-02 11:26:38 GMT
1999-06-02 11:26:38 GMT
Hi,
>>>>> On 01 Jun 1999 18:12:30 +0200
>>>>> Hans de Graaff <graaff <at> xs4all.nl> said:
Hans> Hmm, determining the language used and putting it in a header
Hans> would also be useful for the message keyword stuff, as it could
Hans> then select the proper exclusion stuff.
word-adaptive-scoring also depends on a list of stopwords. True, you
can put words from all langauges you read in there, but "die" is one
example that is highly significant in English, but not at all in
German.
Perhaps the list of "frequent {English,German,French,Suaheli,...}
words" can be unified. I.e. we'd have
(defvar frequent-words
'((english "the" "a" "one" "for" "that")
(german "der" "die" "das" "mit" "eine" "einer")
(lisp "defun" "defvar" "cond")))
which is used for language detection (the langage with the most hits
is probably correct), and stopword generation.
Robbe
--
--
Robert Bihlmeyer reads: Deutsch, English, MIME, Latin-1, NO SPAM!
<robbe <at> orcus.priv.at> <http://stud2.tuwien.ac.at/~e9426626/sig.html>
RSS Feed