Tags

Other

What is Sentiment Analysis? - By Jamie Maguire

Introduction

People have always had an interest in what specific individuals or groups of individuals think or what their opinion is.  Since the inception of the internet, increasing numbers of people are using on-line websites and services to express their opinion.  With social media sites such as Facebook, LinkedIn and Twitter, it is becoming feasible to automate and gauge what public opinion is for a given topic, a news story, product or brand.

Opinions that are mined from such services can be valuable and datasets that are gathered can be analysed and presented in such a way that it becomes easy to identify if the on-line mood is positive or negative.  This allows individuals or business to be proactive as opposed to reactive when a negative conversational thread is emerging.  Alternatively, positive sentiment can be identified thereby allowing the identification of product advocates or to see which parts of a business strategy are working.

The amount of opinion data available online is vast when compared to traditional opinion analysis such as paper-based questionnaires and surveys for example.  Just look at the following statistics (taken from here)

Every minute:

  • Facebook users share nearly 2.5 million pieces of content.
  • Twitter users tweet nearly 300,000 times.
  • Instagram users post nearly 220,000 new photos.
  • YouTube users upload 72 hours of new video content.
  • Apple users download nearly 50,000 apps.
  • Email users send over 200 million messages.
  • Amazon generates over $80,000 in online sales.

Making “sense of the noise” can be somewhat difficult however computational methods can be applied to automatically extract data, analyse and classify this opinion data.  This technique is known as Sentiment Analysis which is a branch of Machine Learning.

Challenges of Sentiment Analysis

Sentiment analysis is not without its challenges.  On-line opinion data, for example, is often published using natural language which is unstructured in its format meaning that it can be hard to categorise.  This is actually the problem most often encountered with sentiment analysis.  Interpreting the mood of a subject may vary from one person to another; a problem made even harder by the format the subject may be analysed in.  These challenges coupled with certain nuances of the English language can make certain texts hard to process.

A paper written by Bing Liu discusses how performing sentiment analysis is a “multi-faceted problem” and goes into detail some of the current challenges.  You can find more information about such things on-line by googling around but enough of the challenges and onto some of the techniques that can be implemented to implement sentiment analysis.

Sentiment Analysis Techniques

There are many ways to implement Sentiment Analysis, ultimately, it is a text classification problem and can be broken down into two main areas: (Carstens, 2011)

  • Supervised Learning
  • Unsupervised Learning

Supervised Learning

This technique involves the construction of a “Classifier” and the problem has been studied intensively.  The Classifier is responsible for categorizing texts into either a positive, negative or neutral polarity.

The three main classification techniques are:

  • Naïve Bayes
  • Maximum Entropy
  • Support Vector Machines (SVM)

From the above, SVM provides the best accuracy. (Bing Liu / Pang et al, 2012).

Unsupervised Learning

Unsupervised Learning has three steps, the first is to implement POS tagging (Part of Speech), then, two consecutive words are extracted to identify if their tags conform to given patterns.  The second step is to estimate the sentiment orientation (SO) of the extracted phrases.  Finally, the third step is to compute the average SO of all phrases that were extracted in terms of positive or negative.

Naïve Bayes

Naïve Bayes is the technique that I’m going to focus on for the purpose of this series of blog posts.  It is used as a means for arriving at predictions in light of relevant evidence.  It is also known as conditional probability or inverse probability.  The theorem was discovered by an English Presbyterian and mathematician called Thomas Bayes and published posthumously in 1763 (Routledge).  It’s easy enough to get your head around and there are quite a few implementations around on the web.  I suggest checking them out.

The algorithm is used in social media listening platforms such as Social Opinion to help classify text and in email spam filters.

I think that’s enough for just now, in my next post I’ll detail the underlying theory and implement an example.

Are you using sentiment analysis or machine learning in any of your projects?

As always, if you have any questions, comments or suggestions then drop me a message.

Email: admin@jamiemaguire.net

Twitter: @jamie_maguire1

Thanks to Jamie Maguire for being our guest writer this week.

Who Are Ronald James?

We are a leading niche digital & tech recruitment specialist for the North East of England. We Specialise in the acquisition of high-performing technology talent across a variety of IT sectors including Digital & Technology Software Development.

Our ultimate goal is to make a positive impact on every client and candidate we serve - from the initial call and introduction, right up to the final delivery, we want our clients and candidates to feel they have had a beneficial and productive experience.

Contact our Team

If you’re looking to start your journey in sourcing talent or find your dream job, you’ll need a passionate, motivated team of experts to guide you. Check out our Jobs page for open vacancies. If interested, contact us or call 0191 620 0123 for a quick chat with our team.

Let's be Friends!

Follow us on our blog, FacebookLinkedInTwitter or Instagram to follow industry news, events, success stories and new blogs releases.

 

Back to Blog

</Follow Us>