Predictive Machine Learning for Underwriting Life and Health Insurance
The dominant underwriting approach is a mix between rule-based engines and traditional underwriting. Applications are first assessed by automated rule-based engines which typically are capable of processing only simple applications. The remaining applications are reviewed by underwriters or referred to the reinsurers. This research aims to construct predictive machine learning models for complicated applications that cannot be processed by rule-based engines. Techniques such as natural language processing and clustering analysis are used to process free-text data such as descriptions of impairments and occupations. Various feature selection methods such as mutual information and recursive feature elimination are used to improve prediction accuracies. Machine learning algorithms such as XGB and Random Forest are used to predict underwriting decisions. XGB is the best performer with 99.5% accuracy on the training set and 80% accuracy on the testing set. Various tools such as word clouds and feature ranking functions are used to give underwriting insights. The paper concludes with data limitations and further researches.