How to Plot ROC Curve in R

A Receiver Operating Characteristic (ROC) curve is a graphical representation used to assess the performance of a binary classification model, such as logistic regression, support vector machines, or decision trees. It illustrates the trade-off between a model’s ability to correctly classify true positive instances and its tendency to incorrectly classify false positive instances as you vary the model’s classification threshold.

The ROC curve is created by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) at various threshold settings. Here are some key terms related to ROC curves:

  1. True Positive Rate (TPR), Sensitivity, or Recall: This is the proportion of actual positive cases that the model correctly predicts as positive. It is calculated as:TPR = True Positives / (True Positives + False Negatives)In the context of a medical test, TPR is the percentage of people with the disease who test positive.
  2. False Positive Rate (FPR): This is the proportion of actual negative cases that the model incorrectly predicts as positive. It is calculated as:FPR = False Positives / (False Positives + True Negatives). FPR is the percentage of healthy individuals who test positive in a medical test.
  3. Threshold: A threshold is a value that the model uses to determine whether a given instance should be classified as positive or negative. Adjusting this threshold allows you to control the balance between TPR and FPR.

Step 1: Install and Load Required Packages

If you haven’t already installed the ROCR package, you can do so using the install.packages function:

install.packages("ROCR")

After installation, load the package:

library(ROCR)

Step 2: Prepare the Data

I will use a hypothetical dataset and a simple logistic regression model for this example. You can replace this with your own data and model.

# Load sample data
data(ROCR.simple)

# Extract true class labels and predicted probabilities
predictions <- ROCR.simple$predictions
labels <- ROCR.simple$labels

Here, predictions are the predicted probabilities (scores) for the positive class, and labels are the true binary labels (0 or 1).

Step 3: Create a Prediction Object

To create a prediction object that ROCR can work with, use the prediction function:

pred <- prediction(predictions, labels)

Step 4: Calculate ROC Curve Data

Use the performance function to calculate the ROC curve data:

roc <- performance(pred, "tpr", "fpr")

Step 5: Plot the ROC Curve

You can now create and customize the ROC curve plot using the plot function:

plot(roc, main="ROC Curve", colorize=TRUE, print.auc=TRUE)
abline(a=0, b=1, lty=2, col="gray")
  • main sets the title of the plot.
  • colorize adds color to the plot.
  • print.auc displays the AUC (Area Under the Curve) on the plot.
  • abline adds a dashed reference line representing a random classifier.
How to Plot ROC Curve in R

Step 6: Save the Plot (Optional)

If you want to save the plot to a file, you can use the pdf, png, or other plotting functions:

pdf("ROC_curve.pdf")
plot(roc, main="ROC Curve", colorize=TRUE, print.auc=TRUE)
abline(a=0, b=1, lty=2, col="gray")
dev.off()

This code saves the ROC curve to a PDF file.

Your ROC curve plot should now be displayed on the screen (or saved as a file if you opted to do so). As you change the classification threshold, the curve will show the trade-off between true positive rate (sensitivity) and false positive rate (1 – specificity).

By Benard Mbithi

A statistics graduate with a knack for crafting data-powered business solutions. I assist businesses in overcoming challenges and achieving their goals through strategic data analysis and problem-solving expertise.