Enhancing a model’s performance can be challenging at times. We are sure, a lot of you would agree with me that you’ve found yourself stuck in a similar situation many a time. You try all the strategies and algorithms that you’ve learnt. Yet, you all fail at improving the accuracy of your model. You feel helpless and stuck. And, this is where 90% of the data scientists give up.
But, this is where the real story begins! This is what differentiates an average data scientist from a master data scientist. While researching your needs for right ML solutions, chances are you come across many complex technical terms like recall, sensitivity, hit rate, true negative rate, false discovery rate, F1 score and many more.
What does all this mean to you as a business professional? Does it resonate with your decision instincts and the risk to reward ratio of the consequences of your decision?
We could gather that experts are looking at ML solutions that would deliver suitable output to aid the selection, but they need to be convinced about the accuracy. The intent and conflict is to define the level of accuracy of the model based on choice context.
Technically talking, you’re plunged headlong right into a marsh pit of true negatives and fake positives and plenty of different phrases regarding a mathematical aggregate of these values.
Now it is indeed not so complicated; if one is aware that the functionality of ML model goes beyond the technical phrases, then it is simple to select the best configuration.
In the later part of this article, we will try and bridge the age old gap between business and the technical user perspective of any enabler device. Normally, enterprise customers are interested in making selections totally based on risks and rewards of the final results or consequences of the actions taken.
A typical practitioner of ML defines the model in terms of
Precision and Recall or how the data manages the solution and takes decisions through observed true and false elements.
Similarly, a business user would look at decisions that can classify his/her bets in terms of the outcome of the action based on the decision. Broadly it can be categorized into risks and Rewards that one associates with the decision. Consider that a business requires to make a decision to detect fraud cases in a particular business scenario. From your experience and data influencing your decisions you have a data set of 1000 records (past factual data) which consists of 60 fraud cases and 940 records that are not fraud. If the ML model predicts that all 1000 reports are not fraud, then that case accuracy of model will be 94%; giving high volume of true negatives.
Using this method, one could conclude that it is an excellent model with great accuracy, but in reality it is failing to predict that 60 of the records are fraud cases, which was the original expected outcome of the model. Hence the model will be of no use, considering it could not identify a single fraud case of business interest.
Measuring simple accuracy based on mathematical formulas without business scenario context is not sufficient. The model should support decisions or required goals that can have the right balance of risk and rewards of consequences. They can lead to alternate ways of measuring the accuracy like sensitivity and specificity for the identified business scenario. Sensitivity of model helps to find what percentage of data is correctly identified while specificity will give correctness of expected negative outcome.
Mathematically Sensitivity = TP/ (TP+FN), Specificity = TN/ (TN+FP)
True Positive (TP) – Actually it is fraud and model also predicted it as fraud. False Positive (FP) – Actually it is not fraud but predicted as fraud.
True Negative (TN) -Actually not fraud and model predicted as not fraud. False Negative (FN) -Actually fraud but model predicted as not fraud.
Striking the balance between sensitivity and specificity leads to model accuracy.
To elaborate further, let us take 8 different test cases and calculate the specificity and sensitivity, and accuracy of 1000 records with 60 frauds and 940 frauds respectively.
Table 1: Example – Predicted outcome value and derived measures
|Outcome (Predicted Fraud)||Outcome (Predicted Not fraud)||TP||FP||TN||FN||Accuracy||Sensitivity||Specificity|
Let’s try to address the same in terms of potential risks and rewards of the decisions. In case you don’t identify the fraud cases accurately – then you are at high risk. And if you identify cases incorrectly as fraud, then you have an increase in false positives, thereby directly impacting your bottom-line, thus impacting the rewards.
From the numerous terms available, certain terms may be most applicable if it helps to break these down into true positives and false positives as in Fig. 1 above.
I have prepared a small auxiliary table that refers to a few business decision scenarios in the payments space that use different evaluation terms, and what they mean in terms of risks and rewards.
|Business Case Description||Risk (Scale of 1-5 -high)||Reward ( 1-5)||Evaluation term|
|1.||Anti-money laundering: Identifying and Investigating potential suspicious transactions in money transfer as per BSA||Violation of BSA Subject to civil money, penalties/criminal sanctions, Impact to other regulations like terrorism funding , Incorrect identification subject to loss of customer business – High Risk(4)||Regulatory Compliance accuracy, Brand loyalty and Trust – Low reward (2)||Very high risk to business VS moderate or low reward leads to using Sensitivity as measure of model.|
|2.||Decision to respond and contest the claims in case of Charge back by the bank (Merchant)||Loss of revenue if charge back is accepted. Account termination Increasing charge back on record – Low to moderate Risk ( 2)||Possibility of settling transactions with customers. In case of wins, regain the profit – Moderate Reward (3)||Low risk and moderate reward lead to Accuracy as measure of model|
|3.||Decision to respond and contest the claims in case of Charge back by the bank(s) (Acquirer)||Loss of merchant(s) if the latter is not protected from non-legitimate charge back, Payback to card brand if merchant defaults – Moderate to high Risk (3)||Merchant satisfaction ,Avoid chargeback investigation fees – Moderate to high Reward (3)||Moderate risk and moderate reward lead to Accuracy as measure of model along with sensitivity|
|4.||Underwriting for merchant(s) on-boarding||Illegitimate merchant(s) has financial liabilities. Moderate risk (2.5)||Quick and accurate underwriting will help more profit and business -Moderate to high Rewards(3)||Specificity can be better measurements for this scenario as rewards are moderate to high and risk is also moderate.|
Table 2. Business scenarios with risk, rewards and required
For any business decision scenarios, having a list of risk/reward ratio handy can help choose the correct model of ML solution and get the balance between accuracy, sensitivity and specificity as shown in table 1.
In our next article, we will cover how to choose an ML algorithm based on expected decision support and various ways to optimize it.
Data in the real world has very poor quality and we spend a lot of time to understand, clean and transform it into something useful for a Machine Learning algorithm. The good news is that we can take advantage of Machine Learning to reduce the time needed to accomplish these tasks and, furthermore, to automate them and influence the right decisions.