Bernhard Pfahringer | 2 Jan 2011 00:45
Picon

Re: Some Confusion with Evaluation framework in Weka API

> Thanks Bernhard  for quick reply. I think I am overlooking some basic idea.
> Please explain what is wrong with  my assumption : Given same confusion
> matrix M1 and M2 in both case with same data and algorithm but with
> different random value(8, 9 as previous) why am I getting two different
> value for AUC and mean absolute error.
>

AUC is about ranking, using the probs to sort your examples.
Accuracy (and the confusion matrix) depend on a specific threshold.
So if your probabilities "sort" the examples differently in different
runs on either side of the threshold, you can get the exact same
accuracy, but different AUC values.

> I was curious because with random seed 1 i get AUC of 0.937 and 3 I get
> 0.874. Confusion matrix is same in both case.Which value should I trust and
> why?

I suppose you are using a rather "unstable" algorithm, and/or a small number
of examples, and/or have a high number of class values. What you experience
is that cross-validation has some variance as well. If the variance is as high
as it seems in your case, I'd repeat at least ten times with a new seed each
time and take the average. BTW, this is the default for the Experimenter:
10x10fold cross-validation, to get more robust estimates.

hth, Bernhard

---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist <at> list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Gmane