20 December 2011

A Warren-ted Search

One of the points "for further research" as I used to say when I was an academic, in the Triage exercise was using social media to measure outcomes. R has a library, twitteR, (yes, R folks tend to capitalize the letter at every opportunity), which retrieves some data. I was at first disinterested, since I don't have a twitter account. Thankfully, twits can be gotten without being a twitterer. Since Elizabeth Warren's campaign is just over the border, and sort of important in the grand scheme of things, I've been exploring.

Here's the entirety of the R code (as seen in an Rstudio session) needed to return the twits (1,500 is the max, which will prove troublesome when the battle is fully engaged):

> library(twitteR)
> warrenTweets <- searchTwitter('@elizabethwarren', n = 1500)
> length(warrenTweets)
[1] 9
> warren.Text <- laply(warrenTweets, function(t) t$getText())
> head(warren.Text, 10)
[1] "@elizabethwarren i hope you win agianst sen scott brown. the 99% r with u"
[2] "@elizabethwarren More $$$ coming your way!"
[3] "#HR3505 PAGING: @ElizabethWarren Help us!!!!"
[4] "@elizabethwarren - not to worry, the only job Karl Rove ever got somebody was George W. Bush. and look how that turned out."
[5] "RT @SenatorBuono: What an amazing turnout 4 a superstar. @elizabethwarren"
[6] "HELLO @ElizabethWarren ! PLEASE RUN as a 3rd party or Ind. FOR POTUS2012. Dems just threwSENIORS underthebus for the working tax cut! EXdem"
[7] "@chucktodd We hope 2011 will be remembered for something a LOT closer to home. #ows #OccupyWallStreet @ElizabethWarren #WARREN/PELOSI-2016"
[8] "RT @SenatorBuono: What an amazing turnout 4 a superstar. @elizabethwarren"
[9] "What an amazing turnout 4 a superstar. @elizabethwarren"


The lines starting with > is the R code. The lines starting with [x] are the output. Here we have 9 twits.

Now, what do we do with the text? For that, I'll send you off to this presentation which came up in my R/twitter search (and is the source of what you've seen here), conducted in Boston. Missed it, dang. With slide 11, is the explanation of how one might parse the twits looking for positive/negative response. By the way, even if you're not the least bit interested in such nonsense, visit slide 29.

As I mentioned in Triage and follow-ups, getting the outcomes data is the largest piece of the work. Simply being able to "guarantee" the accuracy of twitter (or any other uncontrolled source) data, given the restriction on returned twits and such, will require some level of data sophistication; which your average Apparatchik likely doesn't care about. The goal, I'll mention again, isn't to emulate Chris Farley's Matt Foley and pump up a candidate no matter what the data say, but to find the candidate out of many most likely to win given some help. Whether Triage would be useful to a single candidate; well, that depends on the inner strength of the candidate.

No comments: