Text mining: sentiment analysis
Definition
Sentiment Analysis or Opinion Mining is the interpretation and classification of emotions (positive, negative and neutral) in text data using text analysis techniques. Sentiment Analysis enables companies to identify customers’ opinions of products, brands or services in online conversations and comments.
Context
Sentiment analysis focuses on polarity (positive, negative, neutral), feelings and emotions (anger, joy, sadness, etc.), and even intentions (e.g., interested vs. uninterested). This means that this analysis method can be used in many different ways.
Example of polarity analysis:
The rating system of a film review website (Allociné, IMDB, Rotten Tomatoes…)
Example of an analysis of emotions:
Customer feedback on website support
Example of analysis based on appearance or features:
A bike-builder may want to know which components of the bike are being criticised (positively or negatively).
Use cases
1. Supervision of social networks
As a company, it’s essential to know how outsiders perceive you. Using Sentiment Analysis to obtain this information can lead you to communicate differently or change your course of action. It can be just as interesting to carry out these analyses without any particular trigger – to find out the opinion “in normal times” – or even when the company is making an announcement.
2. Customer feedback analysis
Let’s take the example of a ready-to-wear company that sells its clothes online by delivery. When a customer places an order on your website, it’s a good idea to get their feedback. This can be done via a questionnaire or simply by writing a free text. In this case, Sentiment Analysis will reveal which items are liked or disliked. Further analysis may even reveal the reasons why.
Methodologies
Sentiment analysis uses several NLP (Natural Language Processing) algorithms and methods.
We can group these methods into three categories:
- A system based on a set of manually developed rules.
- A system based on Machine Learning techniques from data.
- A hybrid system based on the two previous systems.
1. Rules-based system
Here we use a set of man-made rules to help identify an opinion’s subjectivity, polarity, or subject.
There are many techniques developed in text mining:
- Word stemming, word labeling, syntactic analysis
- Lexicon of words (i.e. list of words and expressions)
Here’s an example of how this three-step system works:
- Define two lists of polarized words (for example, negative words like “ugly”, “bad”, “the worst”, and positive words like “beautiful”, “good”, “the best”, etc.).
- Count the number of positive and negative words in a given text.
- If there are more positive than negative words, the text is considered positive and vice versa. In the event of a tie, the text is considered neutral.
The main drawback of this system is that it takes words one by one and does not account for word sequence, which makes it unreliable. In this case, you must add a lot of rules to get a satisfactory result, but this makes the system all the more complex.
2. System based on machine learning techniques
In this case, we no longer rely on elaborate rules, but on Machine Learning techniques.
Sentiment analysis can be modelled as a classification problem where the model is given a text and returns a category (e.g., positive, negative, neutral).
First, we’ll train our model to associate an input (e.g., text) with a result (e.g., positive, negative, neutral). To train it, we teach our model to take a text and transform it into a vector of words. These vectors are then associated with results (categories) and injected into our model.
In this way, we can extend the model and use it in practical cases. In a prediction exercise, where our model doesn’t know the input text, our model transforms the text into a vector of words. Once injected into our model, it will generate a predicted result.
3. Hybrid system
Hybrid systems combine elements of rule-based methods and automatic techniques in a single system. One of the major advantages is the greater accuracy of results.
The challenges of sentiment analysis
Despite the use of increasingly sophisticated systems, they remain limited by the complexity of words and the human brain. Scientists are trying to develop ever more accurate sentiment classifiers, in order to overcome the current limitations.
1. Subjectivity and Tone
It’s just as important to analyse the subjectivity or objectivity of a text as the tone used. Take these two examples:
Ex 1: The wallpaper is beautiful.
Ex 2: The wallpaper is white.
Sentiment can be considered positive for the first sentence and neutral for the second. Not all predicates (adjectives, verbs, nouns…) need to be treated in the same way when analysing the sentiment of a sentence. Here, “beautiful” is much more subjective than “white”.
2. Context and Polarity
Every statement is made in a certain context. Thus, analysing the sentiment of a sentence without its context is difficult. But machines can’t learn from contexts unless they’re explicitly mentioned. Take these two sentences:
Ex 1: Everything!
Ex 2: Nothing!
If the question is “What did you like?”, the first answer will be positive and the second negative. But if the question is “What didn’t you like?”, the meaning of both answers changes completely. Therefore, pre- or post-processing will be important to ensure that the machine understands the context that may have caused certain responses. Nevertheless, this remains a difficult task.
3. Irony and Sarcasm
In the case of irony and sarcasm, people express their negative feelings using positive words, which can be difficult for machines to detect without a thorough understanding of the context in which a feeling has been expressed.
For example, if we take the answer to the question:
“Did you enjoy your experience on our site?”
“Yes, of course! There’s no bug!”
Here, at first glance, it would seem that the answer is yes. However, one could very well see irony and understand the opposite. The problem is that there are no textual clues to help the machine learn or, at the very least, question the true sentiment behind this sentence.
Conclusion
Machine Learning, a new tool for using data, has yet to be deployed to its full potential. Nevertheless, the technological progress of this method is opening venues for research and new business opportunities. With increased adoption, sentiment analysis will enable us to understand our customers better and give our teams new perspectives for better, more productive work.
Today, a company’s image is very important. The repercussions of a poor image can be felt very quickly, especially on social networks. HeadMind Partners data consultants specialize in these new technologies using artificial intelligence and data science. Using sentiment analysis can help monitor and control the image of a customer or entity.
Website: “Sentiment Analysis: A Definitive Guide”. MonkeyLearn. 2020. [accessed on 11/05/2020]. Available at: https://monkeylearn.com/sentiment-analysis/.
Website: GUPTA, Shashank. “Sentiment Analysis: Concept, Analysis and Applications”. Towardsdatascience. 2018 [accessed 08/06/2020]. Available from https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-applications-6c94d6f58c17.
Book : BOULLIER, Dominique / LOHARD, Audrey. Opinion mining and Sentiment analysis. OpenEdition Press, 2012
Pixabay image bank: https://pixabay.com/fr/photos/smiley-%C3%A9motic%C3%B4ne-col%C3%A8re-anxi%C3%A9t%C3%A9-2979107/