I've been messing around a little bit with the Twitter streaming API and the igraph python programming library. Not really very PPC related, but it is quite interesting. I'm a long time watcher of the #PPCChat hashtag so in order to relate this post in some way to paid search I'm going to do some analysis of this community.
I've been downloading and saving every tweet on the #PPCChat hashtag using the Twitter streaming API. If you're interested in this sort of thing then you should look at my twitter streaming tweet saver python script. You'll need the tweetstream package but otherwise it should work out of the box; just provide your username and password on the command line.
6151 tweets later I pulled the data I had and began to do some analysis.
When looking at a network or community a popular way to figure out who is influential is to use the PageRank algorithm which should be familiar as the break through that made Google's organic search so much better than their competitors.
After processing and sanitizing the data a little I was able to create a graph where each vertex is either a participant in #PPCChat or someone mentioned by a participant in #PPCChat. I made an edge from person A to person B whenever person A mentioned person B in a tweet. This means there could be multiple edges between people.
With 745 participants/mentions and 8488 edges between then visualizing the graph is a bit of a mess:
igraph has an inbuilt method for calculating PageRank which ran surprisingly fast given how long it took to generate the graph in the first place. Then it was simply a case of sorting the results.
The Top 10
- Matt Umbro (PageRank of 0.07816)
- Marin Software (0.07799)
- Melissa Mackey (0.07319)
- Michelle Morgan (0.03864)
- James Svoboda (0.03749)
- Luke Alley (0.03148)
- AdCenter (0.02782)
- Aaron Levy (0.02688)
- Chris Kostecki (0.02036)
- AdWords (0.02036)
NB: Floating point errors (I think) have caused the downloadable list to change since I did the top 10 list. I think this error is worth leaving as it stands to show the problems and risks of using PageRank.
Analysis (or WTF Marin?)
I think it is safe to say that the top 10 contains all the usual suspects but I am surprised to see Marin software rank so highly. They rarely appear in my stream, so what is going on here?
Marin have 22 mentions compared to Matt's 876 and only one of them (apart from themselves) has a page rank about 0.001.
Marin have only mentioned a #PPCChat user once whilst using the hashtag and in this case, the user was @marinsoftware! This means that Marin have created a loop in our graph; PageRank flows in, but none of it flows out!
After deleting all self mentions from the graph (not just Marin's) then the top 10 is as follows:
- Matt Umbro
- Melissa Mackey
- Michelle Morgan
- James Svoboda
- Luke Alley
- Aaron Levy
- Chris Kostecki
- Neil Sorenson
This list seems much more reasonable. Marin drop back to 47th place.
No doubt I could continue "refining" this until it perfectly matched what I think it should, but I think I'm done now.
Download the python script that I used.