Speaker: Friedrich Loser
Several claim prediction competitions on "Your Home of Data Science" kaggle.com were dominated and finally won by applying non-parametric machine learning methods, e.g. random forest and gradient tree boosting, instead of using parametric actuarial methods. So, is this a threat for actuaries? Should non-life actuaries switch to machine learning techniques?
In this presentation, the insurance data, the machine learning methods and the winning solutions for three claim prediction competitions on kaggle.com are briefly described:
To complete the picture, the most popular tools (Python, R and fast algorithms like XGBoost) and some "kaggler" are sketched.
The winning teams used complex stacked model ensembles to avoid overfitting and to minimize variance. Despite that, we will focus on the best single machine learning models and compare them to parametric actuarial methods. The performance of the new, as well as traditional best models, will be evaluated and compared to the often inapplicable complex winning solutions by using post competition submissions.
Finally, results and leader boards of machine learning and actuarial methods in claim prediction are presented and the advantages of both approaches assessed.