Twitter Sentiment Analysis

Sentiment analysis (or opinion mining) aims to understand . It involves natural language processing and sometimes machine learning. In this tutorial we will apply sentiment analysis to get a sense of the attitude of consumers about a few car brands. We will learn how to:

Create the twitter stream

This part is about creating a new twitter dataset and listen to the keywords you are interested in. Since after setting twitter connection you have to wait for your dataset to be populated, we provide one in the project.

Create a sentiment predictive model

This part is about scoring each tweet sentiment. Then, averaging over a brand or a paticular keyword gives you a proxy of the average sentiment in the associated tweets. To generate this scoring we propose the following method: - learn a model of the global sentiment of a tweet on a dataset containing the tweet text and a sentiment: -1 for negative and +1 for negative. This corresponds to a classification problem. - predict the probability of our brand tweets to be in the two class (sentiments) - keep the expected value as the sentiment score. Ie: score = P(sentiment = 1) - P(sentiment = -1). In the end, the score is 1 if the tweet is very positive, -1 if the tweet is negative and 0 the tweet is rather neutral.

Ok so let's get started! First get the data here. This dataset was created using a small kaggle dataset and a 1.6 million lines dataset from stanford's sentiment 140. The final dataset is composed of two columns:the text and the sentiment (-1 or 1).

Now, let's create a model. Go to the model page and choose "prediction".


Now that the model has been trained, it's time to score the incoming tweets. Create a scoring recipe and choose the model you just built. Set the partition dependencies to "All available" and run the newly created recipe. When the job is done you can explore the scored dataset. The probabilities estimates of belonging to class 1 or -1 are appended to the previous list of columns.

Create Dashboard results

In this part we'll show you how to generate a few dashboard for the end users that will review your pinboard. We used a postgresql connection to get faster results but any other type of files would work.

We will first focus on doing a daily report for the total number of tweets and for the global sentiment before breaking it down on the brand / keywords.

TO DO IN V2 FOR charts and so on.

Next steps

To refine your analysis, you could follow different ideas that lie outside of the scope of this tutotial: