matix.io

Finding polarized sentiment in tweets using Python

February 14, 2019

This tutorial is a continuation of our access your Twitter feed using Python article. If you need, check it out.

The following Tweet has gotten a lot of action over the past few days:

Regardless of your position on the subject, the interesting part about it is that the community is extremely polarized in their opinions.

People either love it or hate it.

This is reflected in their responses. Some are overwhelmingly positive and others are overwhelming negative.

Let's have a look at the overall sentiment of responses.

Pulling the tweet data

Unfortunately, Twitter doesn't offer an easy way to get replies to a particular tweet.

It seems that we're able to search all tweets sent to a @BenLesh by using that as a search term.

All the tweets have an id, and the search seems to return items in reverse chronological order.

Replies also have a property in_reply_to_status_id.

The best we can do is as follows:

  1. Visit the tweet in your browser and grab the URL (for this tweet, its https://twitter.com/BenLesh/status/1095487146251546624)
  2. The URL format is /<twitter_handle>/status/<status_id>. Grab the status_id
  3. Search all tweets sent to that user. You can get a maximum of 200 tweets per search query, so you'll get the 200 most recent tweets sent to that user.
  4. Go through the response tweets. Compare the in_reply_to_status_id to the status_id from step 2; if it's a match, you found a reply! Also, keep note of the minimum id value, as we'll tell Twitter during the next search that we want ids less than that.
  5. Repeat steps 3 & 4

Unfortunately, we don't have an end condition. You can check the number of replies to a tweet and use that as your end condition.

Depending on how old the tweet is, this might take a long time. You also may hit the APIs rate limits at some point.

Basically, this is just a "do the best you can" thing.

Here's the code used to pull the data. There's no end condition, so you'll just have to ctrl-c when you think you have enough data.

At the time of running, the tweet in question was only 2 days old, and 180/182 responses were accumulated in about 100 requests.


import twitter
import json

CONSUMER_KEY = ''
CONSUMER_SECRET = ''
ACCESS_TOKEN = ''
ACCESS_TOKEN_SECRET = ''

connection_details = {
    'consumer_key': CONSUMER_KEY,
    'consumer_secret': CONSUMER_SECRET,
    'access_token_key': ACCESS_TOKEN,
    'access_token_secret': ACCESS_TOKEN_SECRET
}

api = twitter.Api(**connection_details)

max_id = None
stored_tweets = []
status_id = 1095487146251546624
search_term = '@BenLesh'

while True:
    tweets = api.GetSearch(term=search_term, max_id=max_id, count=200)

    for tweet in tweets:
        if tweet.in_reply_to_status_id == status_id:
            stored_tweets.append(tweet.text)

        if max_id is None or max_id > tweet.id:
            max_id = tweet.id

    print('found {} tweets'.format(len(stored_tweets)))
    with open('./cache.json', 'w') as f:
        f.write(json.dumps(stored_tweets))

Next steps

Now that we have the data, we'll analyze it in a subsequent post.