Application of machine learning in health insurance to reduce claims leakage and improve underwritin
Speaker(s): Satraajeet Mukherjee, Anitya Ajmani
Claims leakage is defined as the difference between the actual claim payment made and the amount that should have been paid if all industry leading practices were applied. Leakage is caused by deviations from established industry or company standards and leading practices. With rising competitive pressures resulting in emphasis on efficiencies, health insurers across the globe are relying on extensive health claims process reviews to identify the leakage, quantify the leakage, investigate the root causes of leakage and implement corrective action plans.
It is well known that claims leakage is driven disproportionately by specific types of claims, especially those claims for which the drivers of development are not well understood early in the claims process. Health claims process reviews often involve extensive manual investigations to understand these drivers, including inefficient treatment procedures, anomalous claim amount, incorrect coverage identification, fraudulent claims, and more. While this in-depth analysis (known as analysis of likely claim progressions) can help capture some part of leakage, manual leakage identification is equivalent to finding a needle in haystack, given the large number and complexity of claims.
This is the primary motivation behind this research study, which illustrates the use of machine learning techniques, especially random forests and other ensemble tree-based models, for the purposes of health claims reviews and fraud/leakage identification. This research paper also illustrates why tree-based modelling methods are the best suited for this modelling problem, where the interaction of multiple factors plays a key role in the model outcomes. Finally, this paper also illustrates how the ensemble methods closely replicate the thought process of a human brain, thus enabling the model to capture the non-linear effects (such as co-morbidity) within the data, and discusses the suitability and shortcomings of each machine learning algorithm depending upon the business problem at hand.