1 Oct 2010 20:41
Re: unknown test data twenty-newsgroups example
Ted Dunning <ted.dunning <at> gmail.com>
2010-10-01 18:41:53 GMT
2010-10-01 18:41:53 GMT
Yes. Instance = training example. Your method of duplicating lines is just what Robin meant. On Fri, Oct 1, 2010 at 3:55 AM, Robin Anil <robin.anil <at> gmail.com> wrote: > > Let me list what I understood. Pl confirm if I got it correct? > > > > Add duplicate extra lines many times in an extra file (conforming to the > > format required by the Bayes Classifier) in the format > > <class-name1><tab><word1> <word2> > > If I want to increase the weight of word1 and word2, so that text with > > those words have higher chance of getting classified as <class-name1> > > > > * > > * > > > No. Duplicating lines increases DF and therefore decreases (IDF == inverse > document frequency) So weight goes down. To increase weight of the word > repeat the word in the same line > > > Regards > Robin >
RSS Feed