One of the most popular posts on here is to do with analysing a social network and finding the most influential community members: The PageRank of #PPCChat participants The code for the project was written in Python, but I've recently written a new version using R. R is a worse language in my opinion, but the number and variety of libraries is amazing so it is worth learning.

Here is the R source code:

Start by loading some libraries. Stringr for regex support and igraph to do the heavy lifting.

Also specify the location where the TSV file of tweet data is stored.
#+BEGIN_{SRC} R
library(stringr)
library(igraph)

location = "/tmp/data/ppcchat.tsv"
#+END_{SRC} R

Load the TSV file. 'stringsAsFactors=FALSE' forces the tweets to be loaded as a string rather than as a factor. Factor is an R data type for discrete values.

Initialise an empty list of edges.
#+BEGIN_{SRC} R
raw <- read.csv("ppcchat.tsv", header=FALSE, sep='\t',stringsAsFactors=FALSE)
edges <- c()
#+END_{SRC} R

Now iterate through the list of tweets and build a list of edge
pairs. raw$V3 contains the text of all the tweets. raw$V10 contains
the username of the person tweeting.
#+BEGIN_{SRC} R
for (i in 1:length(raw$V3)) {
#Extract the usernames from the tweets
mentions = unlist(str_{extract}_{all}(tolower(raw$V3[i]),"@[a-z0-9_{]}{2,15}"))
if (length(mentions)!=0) {
for (j in 1:length(mentions)) {
if(raw$V10[i]!="" && substring(mentions[j],2)!="") { #needed for when parser borks
edges=c(edges,c(tolower(raw$V10[i]),substring(mentions[j],2)))
}
}
}
}
#+END_{SRC} R

Turn this into an adjacency matrix and create the graph
#+BEGIN_{SRC} R
edgematrix <- t(matrix(edges,nrow=2))
g <- graph.edgelist(edgematrix)
#+END_{SRC} R

I have found that you get far better results from this kind of thing
when loops are removed from the graph. This means ignoring tweets
where a person mentions themselves.
#+BEGIN_{SRC} R
for (i in 1:length(g[,1])){
g[i,i] = 0
}
#+END_{SRC} R

Finally, calculate PageRank and return the top ten
#+BEGIN_{SRC} R
pr<-page.rank(g,directed=TRUE)
topten <- sort(pr$vector,decreasing=TRUE)[1:10]
#+END_{SRC} R