Extension of Classical Data Science Tools by Text Mining for Claims Classification and Reserving
The applications of machine learning techniques like the random forest algorithm or deep neural networks continue to attract increasing attention across all sectors within the insurance industry. Especially in view of the more rapid increase in large available amounts of data, these types of methodologies are becoming more and more important for a wide range of practical use cases. However, with the offer of new numerous approaches, frameworks and additional parameters, implementation is becoming increasingly complex. Which steps are necessary for the individual problem, which information can be extracted from the data by feature engineering and how are all these points finally transferred into a functioning pipeline? These types of questions, beyond the choice of the model and the algorithm itself, need to be carefully considered. Therefore, we provide a toolkit, what steps and methodologies are necessary to implement and, afterwards, use state of the art data science and machine learning pipelines. Thereby, we will look at the whole process, with particular focus on the different challenges, as well as illustrating specific procedures for the application of data preprocessing. We will especially examine the use of text mining methods, that originally stem from entropy respectively information theory, in order to analyze provided case summaries of different claims. All these considerations are entirely applied within a concrete use case in the P&C area regarding claims classification as well as claims reserving.