The aim of this paper is to calibrate and compare different Machine Learning techniques (or ML) for one of the most complex and relevant behavior observable in the Insurance Market: the Conversion Rate. The selected perimeter is the Motor Third Party Liability (MTPL) for cars.
Defined the Conversion Rate as the ratio between policies and quote request, a good prediction of this KPI produces at least two main advantages for an Insurer:
Generalized Linear Model (GLM), defined as a standard for pricing and/or predictive modelling purposes in the Insurance Market, is used as frame of reference.
Using an Automated Bayesian Approach¹, we are able to optimize the hyperparameters for each model under study, avoiding the risk of selecting the wrong hyperparameters. We calibrate the Random Forest, the XGBoost, the LightGBM and the CatBoost using 5-fold cross validation; in particular, we compute the mean of the five predictions of each model on the test set. One benefit of this approach is that the 5-fold cross validation errors tends to be of the same magnitude of the test error, increasing our confidence in the models predictions.
Among the singular models, the LightGBM has the highest F Score (0.195), followed by the XGBoost (0.189). The Glm (0.151) and the CatBoost (0.161) are the less performing singular models.
Different Ensemble Models have been studied. Using simulations, a weighted average of the five singular models is able to attain an F Score of 0.200, proving that all models are sub optimals.
Starting from the most significant features detected using the Shap Value² and the five predictions of the models as new features, a second layer LightGBM model is trained. This two-layer Ensemble Model is the most powerful predictive model found (0.202).