Confusion matrix in cybercrime cases

The confusion matrix plays a major role in the classification model because it helps us to evaluate how good is our classification model is. Lets us start by understanding what is confusion matrix is.

A confusion matrix is the N*N matrix where N is the number of target classes. Visualizing the confusion matrix can be done by using scikitplot module.

Confusion matrix for binary classification.

Here in binary classification, we have 4 values TP, FP, FN, TN. What does it tell us?

True Positive (TP):

The predicted value matches the actual value and both are positive values.

True Negative (TN):

The predicted value matches the actual value and both are negative values.

False Positive (FP): (Type-1 error)

The predicted value doesn’t match the actual value.

The predicted value is positive and while the actual value is negative.

False Negative (FN): (Type-2 error)

The predicted value doesn’t match the actual value.

The predicted value is negative and the actual value is positive.

Consider this confusion matrix for identity theft.

Out of 165 cases :

True Positive (TP): 100 positive data points were correctly classified by the model.

True Negative (TN): 50 negative points were predicted correctly by the model.

False Positive (FP): 10 negative data were incorrectly predicted as positive by the model.

False-negative (FN): 5 positive data were incorrectly predicted as negative by the model.

The problem is with False Positive (FP) and False Negative (FN) and with FN as the most dangerous, because we are not aware of the issue which is going to happen. It will create an environment where we believe there is no issues.

Accuracy:

Accuracy helps us to define how often our classifier is been right.

It is the ratio of the sum of all true values to total values.

Let's find accuracy for the above model.

Accuracy = (100+50)/(100+50+10+5) = 150/165 =0.9090

Accuracy = 90.9%

Precision:

Precision tells about how well the model is able to classify the positive values correctly.

It is the ratio of True Positives to the sum of True and False Positives.

The precision value lies between 0 and 1. For a good classifier, the precision value should be 1. As far as the value of False Positive increases, the precision value will below.

Precision = 100/(100+10) = 100/110 = 0.9090

Precision = 0.909

Recall:

The recall is the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

It is the ratio of True Positive to the sum of True Positive and False Negative.

Recall =100/(100+5) = 100/105 = 0.9523.

Recall = 0.9523

F1 score

F1-score is the Harmonic mean of the Precision and Recall.

F1 score is considered perfect when its value is 1 and considered to be a failure when it is 0.

F1 Score = 2* (0.909*0.9523)/(0.909+0.9523) = 1.7313/1.8613 = 0.9301.

F1 score = 0.9301.

Thanks for Reading !!

Keep Learning !! Keep Sharing !!

You can contact me on😅:

LinkedIn emailme

ARTH-School of technology, BCA graduate