If you want to turn your predictive models in a goldmine you need to know how to interpret the model quality assessment provided by DSS. It can get complicated because the error measure to take into account may depend on the bussiness problem you want to solve. Indeed to interpret and draw most profit from your analysis you have to understand basic concept behinds error assessment. And even more importantly, you have to think to the actions you will take in response to your analysis.
To tackle this difficult problem, we are going to make three posts to help you use DSS models to take action. The first one will focus on how to interpret regression model errors, the second classification model errors and the third will focus on assessing variable importances in models.
In this post we will go through metrics given when you use a regression model in DSS and how to interpret them in terms of bussiness applications.
As an illustration we will use the Boston housing dataset. It's a classical of machine learning textbook dataset and it's also a bussiness application. The target we want to predict the house pricing in function of features such as the number of teacher per children or the distance to employment hub. Finding the right price is essential to make sales in reasonable time and at the best price.
Upload Boston Housing dataset in DSS, open an analysis on the dataset and create a first prediction model with the price as target feature. DSS will automatically train two models on the dataset and display them in a list.
Actual vs predicted values scatterplot
When you ask DSS to train a model, it first split your dataset into a train and a test set. Click on a model in the list and select the training information tab:
DSS optimizes model parameters on the train set and keep the test set just for assessing the model (check our posts on the most frequent errors in machine learning to understand why this is absolutely necessary). Then go to the scatter plot tab. For each point in the test set, the graph displays the actual value of the target in abscisse and the value of the target predicted by the model. If the model were perfect, all points would be on the diagonal meaning that predicted value are absolutely equal to actual values. The error on datapoint is the vertical distance between the point and the diagonal. The point below the diagonal are underestimated by the model and points above are overestimated.
Logically you should search to minimize the distance from the points to the diagonal (hence the error). However for some applications, overestimating is way more problematic than underestimating. An example would be predictive maintenance. Imagine you own a truck company and you want to know when truck are going to encounter failure so you can repare them when there are at the garage and prevent failures from happenning on the street.
In this case, you should underestimate the time left before a failure happens to be sure to act on it properly. Overestimating means you won't repare trucks in time so the model would be useless. This perfectly illustrate the implication of errors might not be symmetric.
On the model list page, there is a field with an indicator that is a global measure of how good your model is. DSS provides you with all classical statistical scores.
Explained Variance Score, R2 score and Pearson correlation coefficient should be compared to 1 (the closer to 1 the better the model is). The other ones (Mean Absolute Error, Mean Absolute Percentage Error, Mean Squared Error, Root Mean Squared Error, Root Mean Squared Logarithmic Error) are error measures and should be as close as possible to zero.
If you click on a model and open the detailled metric tab you will have a list of scores. Each score is shortly defined.You can refer to wikipedia entries for the theoritical definitions of these scores.
Here we will go through the scores, give an interpretation, and develop when possible an implication in terms of business.
Pearson coefficient: Correlation is computed between two variables and assesses to what extent the variables vary simultaneously. If the variables increase at the same time they are said to correlate. Inversely, if one variable increases while the other decreases, they anticorrelate. In our case, the correlation coefficient is computed between the predicted and the actual values. Hence, the closer the correlation is to 1, the better the model.
R2 (coefficient of determination) is the square of the Pearson coefficient.
Explained Variance Score: It gives you the fraction of what your model explain from the target. If you may, it's the fraction of what the model knows compared to all there is to know on your target.
Mean Squared Error and Root Mean Squared Error: The variance of the error and the standart deviation of the error. If the model error is normally distributed (check the error distribution to know that), multiplying the standart deviation by three, you will obtain a very safe interval of how big your error can be. The problem of this error, is that it is very sensitive to outliers. This means that if you have good model that have a big error on one datapoint (that might be a measurement error) this score will be high.
Mean Absolute Error (MAE) An alternative to the outlier sensitive MSE is the MAE. Because it takes the absolute value of the error, the effect of one outlier won't be as drastic than on Root Mean Squared Error.
Mean Average Percentage Error is an interesting metric. Instead of measuring the simple discrepancy between predicted values and actual values it gives the error in percentage of the value itself. This means that an error of 10000$ on a flat of 200000$ is considered equivalent to an error of 100000$ on a mansion of 2000000$. This seems particularly appropriate to price target.
Construct your custom metric: the metric above make sense in most cases but depending of what you want to do you may need to create a custom metric. Going back to our predictive maintenance example with trucks, you could implement a score that is the global price of reparation for the next year. Each time your model predicted the failure before it happens it cost you the price of a simple reparation, if it the predicted failure happens after the actual failure it will cost you the price of the reparation plus the price of the truck retrieval! Of course you don't want to put your truck in reparation every two days. So your score should take the frequency of reparation into account.
Beyond machine learning is Business
Scores and assessment charts are essentials to evaluate your model. But beware that they really become interesting when you think of what you are going to do with predictions. Optimizing a score without a bussiness perspective could lead you to perfect nonsense. This said, isn't it nice to let DSS do the math for you while you keep the simulating tasks like interpretation and critical thinking for yourself.