I’m going to detail an easy way to analyse #GamerGate tweets. I’ve heard they have a “fake data scientist”, not sure if this is true as being a Womble I only pick bits and pieces up here and there. But one thing is for sure, if he really did only suck in 5K tweets for his “analysis” then he is not the best data scientist on the common today. I’ll show you how you can easily hoover up vast numbers of GG tweets, for whatever data analysis requirements you may have. Read on for a whirlwind overview on how to out data scientist the #GamerGate data scientist.
Let me introduce you to something called ElasticSearch a system for storing semi/structured/unstructured data, indexing and retrieving data based on queries, it also supports JSON quite nicely, the native format of any tweets you might download. Let me also introduce you to something called Kibana, now this works with ElasticSearch to give you a nice funky interface to query and visualise data in ElasticSearch.
Now, you’ve got some Linux, right? Seriously, you need Linux for this tutorial, go, get it!
Download ElasticSearch, Download Kibana …. Usually the next line would be to download Logstash, but in the burrow we don’t get much power, so our WomblePC is fairly low powered and Logstash uses large amounts of memory (not that ES and Kibana don’t, but our little server was straining!). So I went for Fluentd instead. So Download that as well …
I’m going to assume you know how to extract and run these using “nohup” so they run in the background. If you don’t you can skip to the end to see what sort of data you can get, this might be a little too technical for you …. If you get stuck, leave me a comment!
For the Fluentd Twitter plugin you need eventmachine, install it.
/opt/td-agent/embedded/bin/fluent-gem install eventmachine
If that fails you may need gcc / buildessentials, install those and retry.
Then the plugin. /opt/td-agent/embedded/bin/fluent-gem install fluent-plugin-twitter
Now you are ready to GO!
… Oh wait, not quite yet. You have a Twitter application? No? Go to https://apps.twitter.com/, create an application (You need a Twitter account with a verified phone number attached). Then copy the OAUTH details off and save them into the fluentd configuration –
Excellent, you are almost ready to start analysing some tweets…. Restart the td agent > sudo service td-agent restart
Check the log for errors, tail -f /var/log/td-agent/td-agent.log
Now, you’ll be getting in tweets, hopefully … How many have I got? In Kibana and Settings -> Indices and Add one, a search should show there is an index called “logstash-<date>”. Add an index pattern of “logstash*”, this is so it picks up every date.
Drop into the “Discover Tab”, an you can see the tweets from the last 15 mins by default. However in mine, I can see 285,751 tweets, as I’ve been running it a while, somewhat better than 5K. Other than a few glitches trying to get Logstash to run on the underpowered WomblePC it works quite well!
How about a bit of fun? Who in this period has tweeted more to #GamerGate than anyone else? Go to visualise -> From a new search -> Choose your index pattern -> Choose vertical bar chart. Add an aggregation on the X-Axis and choose “Terms”, you can then select a field, obvious one is user.screen_name. Who cares most about “Ethics in Journalism”?
Click the ^ icon at the bottom of the chart and you can get the raw numbers, even see the call made to ES and the raw JSON reply back. In this case 4rtt5ty is the, err, winner.
But what about most influential? Someone must be getting lots of RTs and favs, who could it be … Favorites first, as we all need a favorite in our lives!
Huh, don’t know what to say. A #GamerGate neutral is most fav’d by #GamerGate, seems legit. (Note figure is not accurate as it adds them up for each RT, but still related to popularity). How about RTs? Well I won’t bother pasting here as it is quite clear who the tribe of #GamerGate feels speaks to them the most, @sushilulutwitch …. So is she a neutral? All I know is my gut says, maybe.
Finally, there is all sorts of interesting stuff you can do with the data when it is in ElasticSeach. I hope to show you sometime, when I’m not picking up all the rubbish you humans leave around my home 😡
PS, the most RT’d tweets in all this morass show GG is unfortunately (for them) fighting an uphill battle, they are not well liked. Top four are against GamerGate … Fifth one isn’t, but I can’t paste it here as I’d lose the SJW from SJWomble, and WTF is an Omble anyway?