In the previous part of this series, we had built our model. We now want to check how our model has performed. We will look into different metrics to conclude how good or bad our model is.
from sklearn.metrics import confusion_matrix print(confusion_matrix(y_true, y_pred))
[[ 4608 3233] [ 1670 23050]]
If you are wondering what these numbers are, this reference image from Tarek Atwan’s blog can help.
The diagonal from the top left to bottom right denotes what the model got right while the other diagonal tells us where our model went wrong. Let us look into the terms TP, TN, FP and FN in the context of our dataset.
True Positive: We predicted them to earn less than 50k, and they actually earn less than 50k
True Negative: We predicted them to earn more than 50k, and they actually earn more than 50k
False Positive: We predicted them to earn less then 50k, but they were actually earning more than 50k
False Negative: We predicted them to earn more than 50k, but they were actually earning less than 50k
Accuracy can be defined as the number of correctly predicted instances to the total number of instances.
from sklearn.metrics import accuracy_score print('Accuracy Score of Model: ', accuracy_score(y_true, y_pred) * 100)
Accuracy Score of Model: 84.94210865759652
Although an accuracy of 84.92 seems good, accuracy must not be the only thing to conclude if our model is doing well. It is good to also compute precision and recall. The image below from Wikipedia illustrates precision and recall.
Precision = (TP) / (TP + FP)
from sklearn.metrics import precision_score print('Precision Score of Model: ', precision_score(y_true, y_pred) * 100)
Precision Score of Model: 87.69927329452498
Recall = (TP) / (TP + FN)
from sklearn.metrics import recall_score print('Recall Score of Model: ', recall_score(y_true, y_pred) * 100)
Recall Score of Model: 93.24433656957929
A metric that combines precision and recall by considering the harmonic mean between both.
F1 = (2 * Precision * Recall) / (Precision + Recall)
from sklearn.metrics import f1_score print('F1 Score of Model: ', f1_score(y_true, y_pred) * 100)
F1 Score of Model: 90.38683998980452
This computes the area under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores. More the area under curve, better it is.
from sklearn.metrics import roc_auc_score print('ROC AUC Score of Model: ', metrics.roc_auc_score(y_true, y_pred) * 100)
ROC AUC Score of Model: 76.00617542673581
The above metrics show that our model is doing fairly well. Congratulations on building your first model!