Hack This: Programming with the Twitter Firehose
How to automate your Twitter in less than 10 lines of code.
Photo illustration by the author
At some point I'll write one of these things about making a full-on Twitterbot, but you'll probably be able to get pretty far on your own with the Twitter API docs in hand. It helps that Twitter is simple, relatively speaking. Its fundamental offering is a datastream dominated by text: Filtering the datastream is its fundamental interaction; contributing to the stream is its fundamental manipulation. It's still pretty "raw."
I'm going to show you two things below: how to look at the Twitter tweetstream in its most raw form on the Python command line, and how to run a script that automatically favorites tweets matching some keywords. Compared to most Hack This entries, this is going to be really quick.
Prerequisites: Assuming you've already downloaded and installed Python, you should do two things. One: spend 10 minutes doing this "Hello, World" Python for non-programmers tutorial. Two: spend another five minutes doing this tutorial on using Python modules.
0.0) Get a Twitter API key
To use the Twitter API either directly or via a third-party library, you need an API key. So, head over to apps.twitter.com and take care of that. It will ask you for some information about your "app" and have you agree to some TOS. It's painless.
Copy down your new API key and API secret and then make an access token (you should see a button). Copy down the token and also the token secret.
1.0) Install Twython
The two main Twitter libraries for Python are Tweepy and Twython. A quick Google search poll finds a slight preference for the latter, so that's what we'll use here. Install it using pip, which you know how to do because you did the tutorial above about using Python modules.
Create a new Python file with whatever name you like and using whatever text editor you like. As usual, I'm using Sublime Text. At the top, import Twython like so:from twython import Twython
Next, we're going to make a Twython instance (or object), which we can imagine as a sort of portal into the API's functionality. We'll need our API keys and secrets.from twython import Twython APP_KEY = 'YOUR KEY' APP_SECRET = 'YOUR SECRET' OAUTH_TOKEN = 'YOUR TOKEN' OAUTH_SECRET = 'YOUR TOKEN SECRET' twitter = Twython(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_SECRET)
Now we can access Twython API via the "twitter" keyword. Test that it works by running a simple search. For example:print(twitter.search(q='hack this'))
This will barf a bunch of data to the screen in the form of a Python dictionary object. We'll learn how to unpack that shortly. For now, we can just see that something is working and we're auth'd to Twitter's satisfaction.
3.0) Stream some Twitter
Let's add this to our script:timeline = twitter.get_home_timeline(screenname='everydayelk',count=50)
You can sub in whatever username you want. Everydayelk is me.
What's now stored in the timeline variable is a Python dictionary. This is the classic Pythonic data structure. It's kind of like a JSON object, in which keys are matched to values. A key is like the word we're looking up in a dictionary (in the classic sense), while the value is the definition.
The dictionary returned for just a single tweet is really big! You can read up further about tweet object-dictionaries in the Twitter API documentation, but here I just printed timeline to the screen and looked at it for a few seconds to see what's inside. I can see right away that there's a "text" property, which is probably the tweet itself, and then there's a "user" property. For printing out a readable timeline, I'm happy with just those two for now. So, let's add this.for tweet in timeline: print tweet['text'] + "by " + tweet['user']['name']
The for-loop construction above is going to look at the timeline object, which is actually a list of dictionaries containing my requested 50 tweets, and for every individual tweet in there it's going to print the tweet text and the user who produced it.
Note that the user contained within a tweet dictionary is actually a dictionary itself, containing all of the information about the user-account. So, I have to extract the user-name by using the key "name." Here's my output (for only 10 tweets, not 50). The redacted tweet is by a friend with a private profile. Note also that "twitter.py" is what I've named my Python script/file and I'm running it from the Bash shell.
That's not quite streaming though. It's more just printing out some tweets. But Twython happens to have a built-in function for properly streaming, called TwythonStreamer.
We use it like this:from twython import Twython from twython import TwythonStreamer class stream(TwythonStreamer): def on_success(self, data): if 'text' in data: print data['text'].encode('utf-8') def on_error(self, status_code, data): print status_code streamer = stream(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_SECRET) streamer.statuses.sample()
There's a few things happening here, but the general idea is that we're taking this TwythonStreamer functionality and extending it a bit with our own class, just in telling it to print out the statuses it receives from the Twitter API. You can imagine it as an open ear to the Twitter status firehose and every time a new tweet comes through, our stream class comes to life and spits it out to the screen.
We actually call the class at the bottom, where we assign it to the variable streamer. The final line tells streamer how we're going to access its data. Here, we use the sample function, which randomly selects tweets from the Twitter status firehose. The firehose itself, in unfiltered form, is not very useful for streaming tweets and is also restricted by Twitter.
4.0) Make it count
We'll close with a quick bit of automation. This is a script that will auto-favorite tweets based on a query term. It's adapted from a script published by "Programming for Marketers" to work with Twython rather than the standard Twitter API library. Just swap in whatever for "hackthis" and there you go. All that's really happening is that we're searching for tweets containing a term and then favoriting them.from twython import Twython from twython import TwythonStreamer twitter = Twython(APP_KEY,APP_SECRET,OAUTH_TOKEN,OAUTH_SECRET) def fave(query,t,count=50): tweets = t.search(q=query,count=count) for tweet in tweets['statuses']: result = t.create_favorite(id=tweet['id']) print("favorited " + result['text'].encode('utf-8')) query = 'hackthis' fave(query,twitter)
Needless to say, this should be wielded with extreme caution, if at all. Its utility is in grabbing cheap followers, mainly. (I've already gotten a few in the 10-ish minutes since I ran the script above.) In all of these examples, it's on you to make sure you're following Twitter's terms of service.
At this point, a Twitterbot should seem like a much more reasonable undertaking. If you can auto-favorite tweets based on search terms, you can auto-reply to tweets, etc. Please use this for good, not evil.