Unstructured data such as text remain quite untapped nowadays in the (re)insurance industry. The first basic reason of this probably comes from the unawareness of the way to handle well these texts. Natural language processing (NLP) held propose various techniques to address text analysis tasks such as extraction or classification. However, insurance industry suffers from incovenient domain specifities that often limit the setting up of such methodologies. The presentation aims at presenting through a contractual analysis and data extraction case study, some of these issues while presenting and applying deep learning techniques to illustrate how to overcome them.
Data collection will firstly be discussed to highlight insurance lack of data as regards of a common NLP projects. Data augmentation techniques such has synonym replacement, random insertion, etc. will be presented and applied. Then data preparation will be introduced to illustrate insurance vocabulary specicities. Pretrained word embeddings (such as Word2Vec, GloVe or ConceptNet) will be compared to custom word embeddings approaches. Data quality aspects and impact on data extraction tasks such as regular expression will be described. Named Entity Recognition (NER) will be explained (Character embedding and Bi-LSTM CRF) and applied to demonstrate the effectiveness of deep learning systems. Annotation part will be also discussed and will precise cost issues. Active learning models (CNN least confidence, DO-BALD and BB-BALD) will be experimented. Results contextualization will be finally presented through different modelling scenarios including sequence representation techniques such as RNN, CNN, HAN and benchmarked, to show domain customization complexity. Methodological perspectives, results regarding contractual analysis and usability aspects will be discussed to conclude.
Aurelien Couloumy is Head of Digital Transformation at CCR Group. He started his career in Paris in 2012 as actuarial consultant at Optimind before moving in 2015 to Brussels, to work for Addactis as Head of Models, and for Reacfin in 2017 as Head of Data Science. Aurélien is a qualified member of the French Institute of Actuaries. He is also a Lecturer at ISFA, and a member of the SAF Laboratory, where he works on various teaching and research topics including machine learning techniques, natural language processing and image processing applied to insurance
Interested in more EAA events? Sign up to our newsletter at www.actuarial-academy.com