We study the application of dynamic pricing in insurance from the perspective of an insurance company. We consider the problem of online revenue management for an insurance company that wishes to sell a new product. We do not consider effects of competition and demand constraint in the market. The insurance company can only observe realised demand and incurred claims but does not know the underlying functions for the insurance product. This is particular relevant for the release of new insurance products. We develop two pricing models: parametric and non-parametric models to balance between exploration (demand/claims learning) and exploitation (pricing) trade-off. We aim to find the relationship between price and demand/total claims, and simultaneously maximize revenues. The performance of the pricing policies is measured in terms of the cumulative Regret: the expected revenue loss caused by not using the optimal price. In the parametric model, we use the maximum quasi-likelihood estimation (MQLE) to estimate the unknown parameters in the model. MQLE parameter estimates eventually exist and converge to the correct values, which implies that the sequence of chosen prices also converge to the optimal price. In the non-parametric model, we sample demand and total claims from Gaussian Processes (GP). We then analyse Gaussian process upper confidence bound (GP-UCB) algorithm on insurance pricing. Although similar results exist in other domains, this is among the first to consider dynamic pricing problems in the field of insurance.
 Srinivas, N., Krause, A., Kakade, Sham M. and Seeger, M. (2010), “Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting.” IEEE Transactions on Information Theory, vol.58 (5), pp. 3250-3265.
 den Boer, A. V. (2013), “Dynamic Pricing and Learning.”