Advanced Concepts of Clustering in Insurance
Cluster analysis is the task of grouping a set of objects (e.g., observations, policies, claims) in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups. In contrast to simple segmentation (e.g. by geographical location only), clustering uses several features to differentiate among those groups. Potential applications are manifold and centred around questions such as, for example:
- In which customer segments do we mainly generate new business?
- Which typical customer should we have in mind while designing new insurance products?
- How can we make use of granular information, such as diagnose or treatment codes, for example, while dealing with a limited number of observations or claims?
- How can we identify outliers in our underwriting or claims process?
The course shows how different algorithms can be used to obtain a segmentation of insurance data. The methods covered range from centroid-based (k-means, k-prototypes) to probabilistic (Gaussian Mixture Models) and density-based (DBSCAN) approaches. We demonstrate how the clustering results can be visualized and evaluated. Moreover, it will be shown how the clustering results can be used to identify outliers in the data set. We also cover techniques that reduce the dimension of the data so that the segments can be computed either on aggregated information or using only a subset of the available information. The course puts an emphasis on the practical application and therefore showcases all concepts on an insurance data set.