A Standardized Machine Learning based approach to Conversion Rate Estimation

A Standardized Machine Learning based approach to Conversion Rate Estimation


Thanks! Share it with your friends!


You disliked this video. Thanks for the feedback!

Sorry, only registred users can create playlists.


The aim of this paper is to calibrate and compare different Machine Learning techniques (or ML) for one of the most complex and relevant behavior observable in the Insurance Market: the Conversion Rate. The selected perimeter is the Motor Third Party Liability (MTPL) for cars.

Defined the Conversion Rate as the ratio between policies and quote request, a good prediction of this KPI produces at least two main advantages for an Insurer:

  • Increase in Competitiveness: this is especially important when the underwriting cycle shows a softening period;
  • Effective price changes: a Company could identify rate changes or dedicated discounts coherently with the estimated conversion and profitability calculated for each potential client asking for a quote, both needed to develop a pricing optimization tool.

Generalized Linear Model (GLM), defined as a standard for pricing and/or predictive modelling purposes in the Insurance Market, is used as frame of reference.

Using an Automated Bayesian Approach¹, we are able to optimize the hyperparameters for each model under study, avoiding the risk of selecting the wrong hyperparameters. We calibrate the Random Forest, the XGBoost, the LightGBM and the CatBoost using 5-fold cross validation; in particular, we compute the mean of the five predictions of each model on the test set. One benefit of this approach is that the 5-fold cross validation errors tends to be of the same magnitude of the test error, increasing our confidence in the models predictions.

Among the singular models, the LightGBM  has the highest F Score (0.195), followed by the XGBoost (0.189). The Glm (0.151) and the CatBoost (0.161) are the less performing singular models.

Different Ensemble Models have been studied. Using simulations, a weighted average of the five singular models is able to attain an F Score of 0.200, proving that all models are sub optimals.

Starting from the most significant features detected using the Shap Value² and the five predictions of the models as new features, a second layer LightGBM model is trained. This two-layer Ensemble Model is the most powerful predictive model found (0.202).



Post your comment

Sign in or sign up to post comments.
Be the first to comment