matix.io

Finding Polarized Sentiment in Tweets using Python (Part 2)

February 15, 2019

In the first part of this series we covered the context of this exploration of data, then showed how to extract the replies of a tweet from Twitter.

This post will cover how to extract sentiment from the tweet text using Python. After extracting the sentiment, we'll have a brief look at it before proposing next steps.

Extracting sentiment from text using Python

TextBlob is a great library for natural language processing (NLP) in Python. We'll use it to extract the sentiment from the text.

First, install it: pip install textblob

Recall from the first part in this series that we have a list stored_tweets, in which each element is the text of the tweet.

We can extract the polarity of the tweets as follows:


from textblob import TextBlob

polarities = [TextBlob(tweet).sentiment.polarity for tweet in stored_tweets]

The polarity will be a float between -1.0 and 1.0, where -1.0 is extremely negative and 1.0 is extremely positive.

Visualizing the polarity

We have polarity metrics for individual tweets, but how polarized is the whole set of tweets?

To visualize our data, let's create discrete buckets to place our data in then plot a histogram.

Our buckets will all be the same size.

For example, if we were using 2 buckets, we would place all polarities between -1.0 and 0.0 in the first bucket, and all polarities between 0.0 and 1.0 in the second bucket.

If we were using 4 buckets, our buckets would be [(-1.0, -0.5), (-0.5, 0.0), (0.0, 0.5), (0.5, 1.0)].

For this tutorial we'll choose to use 20 buckets (arbitrarily).

We'll make use of a numpy function digitize to build our buckets. If you don't have numpy installed, install it with pip install numpy.


import numpy as np

num_bins = 20
min_val = -1
max_val = 1
step = (max_val - min_val) / float(num_bins)
bins = [min_val + step * i for i in range(num_bins)]
polarity_buckets = np.digitize(polarities, bins)

Next, we can display count the buckets on a histogram.

We'll use pandas, matplotlib, and jupyter notebooks to do this. You can install them by running pip install pandas matplotlib jupyter. Run jupyter notebook to launch the notebook server, and run all of your code in that notebook.


import matplotlib.pyplot as plt
import pandas as pd

pd.DataFrame(polarity_buckets).hist()
plt.show()

And here's the output:

Keep in mind that the 10th / 11th buckets are our "neutral" polarity, so those x-axis labels should be adjusted accordingly.

Next Steps

In our visualization, we see what is probably a rough bell curve.

It's likely that most sets of tweets, when visualized like this, will begin to form a bell curve, with the tails representing the extremely polarized tweets and the peak representing neutral opinions.

In the next part of this series, we'll explore kurtosis and what it means for a set of tweets.