Submitted by likeamanyfacedgod t3_y9n120 in MachineLearning
PassionatePossum t1_it6ms0v wrote
Reply to comment by likeamanyfacedgod in [D] Accurate blogs on machine learning? by likeamanyfacedgod
>the TPR and FPR are both fractions, so it won't matter if one is a larger class than the other.
In most cases that is a desirable property. You don't want to have excellent results just because one class makes up 99% of your dataset and the classifier just predicts the most common class without learning anything. Precision and Recall are also fractions.
The difference between ROC and Precision/Recall is that ROC needs the concept of a "negative class". That can be problematic for multi-class problems. Even if your data is perfectly balanced across all of your classes, the negative class (i.e. all classes that aren't the class you are examining) is bound to be overrepresented.
Since you only need the positive examples for a precision/recall plot you don't have that problem.
So, I don't have a problem with the statement that ROC is appropiate for a balanced dataset (provided that we have a binary classification problem or the number of different classes is at least low).
madrury83 t1_it7hw31 wrote
I think the more rigorous way to get at the OPs point is to observe that the AUC is the probability that a randomly selected positive class is scored higher (by your fixed model) than a randomly chosen negative class. Being probabilities, these are independent (at a population level) of the number of samples you have from your positive and negative populations (of course, smaller samples get you more sampling variance). I believe this is the OPs point with "they are fractions".
In any case, can we at least all agree that blogs/articles throwing around this kind of advice without justification is less than helpful?
Viewing a single comment thread. View all comments