NLP and Machine Learning in Social Media Opinion Mining
Twitter is a gold mine of social media big data with around 500 million tweets posted daily. We applied NLP and machine learning to Twitter data related to COVID-19, to gauge sentiment and mine opinions on topics pertinent to public interest, insurers and the insurance industry.
The steps include sourcing and hydrating unstructured text data using Twitter API, country classification, pre-processing as well as encoding (bag of words, TF-IDF and neural embeddings). We then show the development and performance of machine learning models, ranging from Naïve Bayes to deep learning, and how these compared to open source tools such as SentiWordNet and TextBlob.
The best-in-class model is used to score millions of tweets to make sentiment predictions, in order to identify trends over time and sentiment relating to specific topics. The sentiment towards the government appeared low during the first wave but returned to neutral levels for the remaining of 2020. The sentiment towards insurance was generally positive. However, many insurers are perceived negatively due to claims and losses, in contrast to some favourable insurers who stood out for their customer service and wellness advice, which should be emulated by others for good of society.
NLP and social media analytics have promising applications across the insurance value chain, including reputation management, product development, sales, marketing, social profiling and, ultimately, providing better services to customers.