This tutorial will give you the information you need to go from wanting to analyze Twitter data to getting your own spreadsheet of tweets. It is aimed at beginners who want to start analyzing real, current social media data. See this page for basic info about what Twitter data is available.
- Sign up for a normal Twitter account if you don’t already have one.
- Get your API credentials by registering an app here. (Yes, you are a developer.)
- Download R and/or rStudio if you don’t already have it. You may need to do a basic tutorial to learn how to use R or rStudio, but you can probably follow this tutorial without much understanding of R beyond how to run commands.
Install required packages (one time only)
If you haven’t done so already, set your working directory (where the output file goes) Windows Example:setwd(“C:/Users/User Name/Documents/FOLDER”) Mac Example: setwd(“/Users/User Name/Documents/FOLDER”)
Downloading the data
Load the required packages (previously installed)
Authenticate to Twitter – fill in the info you got during Getting Started Step 2 above
setup_twitter_oauth(consumer_key='YourConsumerKey', consumer_secret='YourConsumerSecret', access_token='YourAccessToken', access_secret='YourAccessSecret')
If you are analyzing a Twitter user’s timeline:
Fill in the name of the user whose timeline you want to download. Do not start with @. It is not case sensitive. That is, hillaryclinton is the same as HillaryClinton. This must be the twitter user name, not the person’s name.
userName = "YourUserNameToDownload"
Download the timeline – this takes about 2-3 minutes
userTl = userTimeline(userName,n=3200, includeRts = TRUE)
If you are analyzing a Twitter search:
See the query operators on this page regarding what can be searched, and just try your search on Twitter first to see if you are getting what you want. You will want to change the number of tweets to something larger probably, and you may or may not not want to limit the language results, if not just delete , lang = “en”.
searchTerm = "Your search term" tweets = searchTwitter(searchTerm, n=1000, lang = "en") dfT = twListToDF(tweets)
All types of analysis:
Flatten the data structure twitteR provides down to a simple spreadsheet-like data frame.
userDf = twListToDF(userTl)
Strip out any tabs or new line characters in the tweet so that your tsv file will not be corrupt.
dfT$tweetC = gsub("\t"," ",dfT$text) dfT$tweetC = gsub("\n"," ",dfT$tweetC)
Name the output TSV file YourUserNameToDownload-tweets.tsv
fileName = paste(userName,"-tweets.tsv",sep="")
Send the user’s timeline data frame to a TSV file on your computer, suitable for uploading to Google Sheets (or many other programs). This will OVERWRITE any existing file with the same name. You can have it append add ,append=TRUE, however if you do that you would also want to add ,col.names = FALSE to not duplicate the header row, and you’d need to check for duplicates when you analyze it.
Once you run the export command your file is in your working directory with the name YourUserNameToDownload-tweets.tsv. This file is now ready to import into a spreadsheet. For ‘real’ data science / research you shouldn’t use a spreadsheet for analysis, but until you’re ready to learn R and/or Python you can learn quite a lot about working with data just starting with a spreadsheet.
Here is the code in one block:
#one time only install.packages("twitteR") install.packages("rio") setwd("YourWorkingDirectory") #download data (fill in your info) library("twitteR") library("rio") setup_twitter_oauth(consumer_key='YourConsumerKey', consumer_secret='YourConsumerSecret', access_token='YourAccessToken', access_secret='YourAccessSecret') #if you are analyzing a user time line userName = "YourUserNameToDownload" userTl = userTimeline(userName,n=3200, includeRts = TRUE) dfT = twListToDF(userTl) #if you are analyzing a search searchTerm = "Your search term" tweets = searchTwitter(searchTerm, n=1000, lang = "en") dfT = twListToDF(tweets)
#export #strip out all tabs and new lines from the tweet field dfT$tweetC = gsub("\t"," ",dfT$text) dfT$tweetC = gsub("\n"," ",dfT$tweetC) #add a link to the tweet dfT$linkToTweet = paste("http://twitter.com/",dfT$screenName,"/status/",dfT$id,sep="") #strip out the link from the source field dfT$sourceC = sub("<a href=\".*\">","",dfT$statusSource) dfT$sourceC = sub("</a>","",dfT$sourceC) #subset only fields to export myvars = c("tweetC","created","linkToTweet","retweetCount","isRetweet","favoriteCount" ,"id","sourceC", "replyToSN", "truncated","replyToSID","replyToUID","screenName","longitude", "latitude") dataExport = dfT[myvars] fileName = paste(userName,"-tweets.tsv",sep="") export(dataExport,fileName)