Abrazolica

Home Archive Tags About RSS

ROC Curves

posted on: Monday, August 6th, 2012, 9:42

A ROC (Receiver Operating Characteristics) curve is a way to display the performance of a binary classifier. Recall that a binary classifier will output a positive (yes) or negative (no) response given some input data. A common example is medical diagnosis. Given a set of symptoms, measurements, and test results, does a person have a disease or not? Another example comes from financial markets. Given that some security \(A\) (bond, stock, ETF, etc.) has gone up in price, does this mean that security \(B\) will also go up in price? You could go on and on giving examples of binary classifiers. The applications are almost limitless.

The most useful classifiers are ones that give not just a positive or negative response but a probability \(p\) that can range from 0 to 1. Whether the probability indicates a positive or negative response is then a matter of interpretation. Interpreting \(p \ge 0.5\) as a positive response is the obvious thing to do but it may not always be optimal. In cases where a false positive can be costly it may be better to use a more conservative threshold such as \(p \ge 0.7\) to signal a positive response. A ROC curve can help you choose the best threshold.

To see how this works, note first of all that there are four possible results of a binary classification. A true positive (\(TP\)) is where the classifier correctly indicates a positive response. In a false positive (\(FP\)) the classifier indicates a positive response where the actual is negative. A true negative (\(TN\)) is where the classifier correctly indicates a negative response. In a false negative (\(FN\)) the classifier indicates a negative response where the actual is positive. The four results can be organized into what is called a confusion matrix as shown below.

Confusion matrix for binary classifier

In a perfect classifier \(FP\) and \(FN\) will be zero. A classifier can also be perfect in a negative sense where the opposite of what it says is always correct. In this case \(TP\) and \(TN\) will be zero. The fraction of correct classifications is called the accuracy of the classifier. It is defined as

\[Accuracy = \frac{TP + TN}{TP + TN + FP + FN}\]

The accuracy is not the only way to measure the performance of a classifier. We can also define a true positive and a false positive rate as follows:

\[TPR = \frac{TP}{TP + FN}\]

\[FPR = \frac{FP}{FP + TN}\]

\(TPR\) is the probability that if there is an actual positive then the classifier will correctly indicate positive. FPR is the probability that if there is an actual negative then the classifier will incorrectly indicate positive. You can also define a true negative and false negative rate.

\[TNR = 1 - TPR\]

\[FNR = 1 - FPR\]

The nice thing about \(TPR\) and \(FPR\) is that they are not sensitive to the proportion of actual positives and negatives. They give a more stable indication of the classifier's performance. In some applications they are more informative than just the overall accuracy of the classifier. For example in medical testing if a person has the disease then you want to maximize the probability of detecting it, i.e. you want to maximize \(TPR\). At the same time if the person does not have the disease you want to minimize the probability of falsely saying that they do.

A ROC curve is just a plot of \(TPR\) versus \(FPR\). If the classifier only outputs a positive or negative response then \(TPR\) and \(FPR\) are fixed values and the plot consists of just a single point. If the classifier outputs the probability of a positive response then the values of \(TPR\) and \(FPR\) will depend on where we set the threshold of what is considered a positive or negative response. As the threshold is varied from 1 to 0, the values of \(TPR\) and \(FPR\) will both start at 0 and end at 1. A plot of these values is what is known as a ROC curve. The figure below shows an example.

ROC curve for the ETF SPY being classified by the ETF QQQ. Labeled points are for predicted probabilities of 0.1, 0.2, 0.3,...,0.9, 1.0. The blue line (\(TPR=FPR\)) is the 'no better than chance' boundary. Everything above is better.

When the threshold is greater than 1 the classifier will have no positive responses so both \(TPR\) and \(FPR\) will be zero. As the threshold drops below 1, positive responses start to appear and if the classifier is any good they should mostly be correct. If the classifier is perfect then \(TPR\) will continue to climb until it reaches 1 with \(FPR\) staying at 0. As the threshold is lowered to zero the classifier outputs nothing but positive responses so that \(FN\) and \(TN\) are both zero and \(TPR\) and \(FPR\) are both equal to 1. The ROC curve always ends at the point (1,1).

If a classifier has little or no predictive power then the ROC curve will be near to the \(TPR=FPR\) line. To see this, just look at the expected number of the four possible outcomes of classification when the classification probabilities are independent of the actual probabilities. The closer the ROC curve gets to the point (1,0) and the larger the area under the ROC curve, the better the classifier overall.

So the moral of the story is that there is more to evaluating a classifier than just looking at the accuracy. You may sometimes need to consider the trade off between benefits and costs of true positives and false positives, or for that matter true negatives and false negatives. A ROC curve can help you do that.

Our upcoming report on Bayesian analysis for stocks and ETFs includes ROC curves, and shows how to make them using gnuplot. If you want to know when it is released, sign up for our newsletter. For software to implement classifiers see the Exstrom Labs web page on data mining and machine learning.