< Back to Blog

Sentiment Analysis on Live Social Media Data Using ‘R’

Thanks to the social media revolution, we are facing an overload of information. As individuals, we can get the latest updates on happenings around the world, access product reviews and get useful feedback from other users on products and services, and lots more.

For businesses, social media can be a double-edged sword though– Social channels enable them to interact and engage with customers at a personal level, but at the same time they also need to be highly cautious about how people are talking about them. Many consumer-facing enterprises are hence currently investing in Sentiment Analysis, to capture insights and derive conclusions on what people (current or potential customers) are saying about them, their products or services on social media.

So, as a learning experiment, I developed a tool that performs Sentiment Analysis on Facebook posts and comments, to promote research that will lead to a better understanding of how sentiment is conveyed in Facebook content. The system, implemented using the R programming language, classifies public posts (both from users and from pages) into six different emotions, ranging from positive to negative to neutral.

Why R?

There are various tools and software packages available, but the following factors make R a better alternative:

  1. Something more than a statistical package: It’s a programming language that helps you create your own objects, functions, and libraries.
  2. Platform-independence: We can apply it to any OS
  3. Absolutely free: Any organization can implement it without purchasing a license. (A commercial version is available for those who prefer support.)
  4. Open-source: That means anyone can examine the source code to see exactly what it’s doing. This also means that anyone can fix bugs and add features, rather than waiting for the vendor to find/fix the bug and add the feature – at their discretion – in a future release.
  5. Integration with other languages (C/C++, Java, Python) allows communication with many data sources, including ODBC-compliant databases (Excel, Access) and other statistical packages (SAS, Stata, SPSS, Minitab). R serves as a glue language for piecing together different data sets, tools, or software packages.
  6. Performance: R has gotten faster over time and now executes very efficiently.

This makes R the best choice to create reproducible, high-quality Sentiment Analysis. It has all the flexibility and power we are looking for when dealing with data.

When an enterprise analyses customer posts and comments in a massive social media network like Facebook, there is obviously not enough time to go through each comment and understand what each customer said. Here`s where R proves to be just right for every stage of the analysis, from data import and cleansing, to exploration and visualization, to doing statistics and analysis. R not only makes data processing easier, it also provides graphic tools to present the results in an understandable format. We humans can absorb visuals far more quickly and effectively than we can analyze some bland numerical data. We would surely prefer the data representation in the form of graphs instead of going through hundreds of values one by one.

Figure 1: Sentiment Analysis of XYZ organization Facebook page for 50 posts

Solution:
Sentiment Analysis can be used to automatically detect emotions, speculations, evaluations and opinions in the content that people write. The sentiment analysis tool extracts data from the comments on a post, cleanses the data and processes it to give us an analysis in the form of a graph that classifies all the comments into polarity and sentiments. This provides insight into comments by classifying them into three polarities (positive, negative & neutral) and into six different emotions (anger, disgust, fear, joy, sadness, surprise). Most of the algorithms for sentiment analysis are based on a classifier trained using a collection of annotated text data. Before training, data is pre-processed to extract the key features. Several classification methods have been proposed: Naive Bayes, Support Vector Machines, K-Nearest Neighbors, etc. Naive Bayes (NB) can be termed as appropriate for our classification strategy because it is a simple and intuitive method. NB combines efficiency (optimal time performance) with reasonable accuracy.

Where,
p(Ck)= p (occurrence of class) [prior]
p(x)= p (instance of word) [likelihood]

Its classifications regarding the decisions are surprisingly accurate. The above function returns an object of class (data.frame) with seven columns  (anger, disgust, fear, joy, sadness, surprise and best_fit category). This best_fit is the most likely sentiment category among the six emotionsfor a given content item.  Similarly, we will classify polarity in the text and combine the emotions of all the comments. In simple words the approach is, if a piece of content has more positive keywords than negative keywords, it’s a positive content; if it has more negative keywords than positive keywords, it’s a negative content.

After the classification, we fetch the “best_fit” category for analysis. When all the data is cleansed and processed we enter the next phase: strategic representation of data. In this phase the processed data is subjected to a function named ‘ggplot()’, which  plots the distribution of emotions (anger, disgust, fear, joy, sadness, surprise). Similarly, we can plot the distribution of polarity (positive, negative and neutral).

Figure 2: Polarity Categorization of XYZ organization Facebook page for 50 posts

Future work:
Analysis has been successfully carried out on Facebook and Twitter. Now, I am trying to adapt the tool to support YouTube, so that the vloggers who have millions of subscribers can easily understand their daily feedbacks (comments, shares, likes, dislikes) strategically in the form of graphs.

Outcome:
Sentiment analysis enables enterprises to understand consumer sentiments in relation to specific products/services. These insights could be used to improve their products and services by gauging consumers’ comments and feedback. In the long run, sentiment analysis, if implemented properly, can help enterprises improve the overall consumer experience, enhance brand image and propel business growth.

Ness enables enterprise clients to integrate and leverage useful data to gain actionable insights to accelerate business growth, drive revenues and improve customer satisfaction. Learn more about how Ness can help.