I have no idea what to call this post

Posted on January 20, 2015

This is actually a fitting title for the following piece of mental spew because the title of a recent Moz article got me slightly riled up. The substance of the blog post is far more resonable - but the title does not do a very nuanced area justice. So in my own response I am not committing to a title out of cowardice.

1 Correlation is not causation

Correlation is not causation has become a bit of a cliche in the SEO world. The statement is true - I am not going to dispute that. But I would add to it that, from a business perspective, correlation is only interesting as far as it says something about causation.

The whole point of doing analysis of data for business is to come up with something where changing it will improve the business. Acting on spurious correlations (e.g. a Cheese business building their strategy around increasing the number of deaths from bedsheet tangles) does not change business outcomes.

To say the same thing again in a different way; businesses need to find factors that they can manipulate that cause a change in business outcomes. The existence (or not) of a correlation can provide evidence for this kind of relationship which is why correlations are worth looking at but if there is not causative relationship then finding a correlation is a waste of time.

This is my main disagreement with what Rand is saying; if there is no causation then the relationship is not useful.

I suspect this is because we haven’t defined clearly enough what we mean by “cause”. But that is a whole minefield that I am completely not qualified to get into.

2 Proxy metrics

Google want to send searchers to the best site for the searcher/query combination. “Best” here is very difficult to define. Google’s initial success was because they used PageRank (combined with onpage factors) as a proxy metric for “best”. It was a closer proxy than what other search engines were using at the time so Google’s results were better.

Problems come along when the proxy metric get separated from the true goal. This happened with PageRank when people began buying links, setting up link wheels etc. (this last statement could be completely false - it is true that old quality signals became divorced from “best” but why this happened I don’t know enough about old SEO to say). So search engines found other proxies that were more closely linked with what the “best” pages were.

This dance of engines building better proxy metrics and SEOs divorcing the proxy from true quality has gone on for many years. Some say this is coming to an end with the ongoing success of Panda and Penguin but others disagree.

Rand advises that modern SEOs do not try to discover what metrics search engines are currently using to proxy for quality and, where such proxy metrics are known, to ignore them. On a much broader scale this describes the shift in SEO over the past years from “exploiting” the algorithm to making great businesses that deserve to rank (i.e. by aiming straight for the mythical “best” rather than by trying to game the system).

I don’t do enough SEO to comment on this from first hand experience. But I will make the following points:

  1. A business that invests resources in becoming the “best” in their niche will suffer if their definition of “best” is not expressed in whatever proxy metrics the search engines are using at the moment
  2. Telling people that all they need to do is be awesome and that then glory and rankings will follow is not very useful advice.


I can’t write a post on causation without drawing a directed, acyclic graph.

Unfortunately, this is a bit of a bastardised one because the nodes are so vaguely defined but I think it will suffice to get my point across.


This diagram says that the true quality of a page, combined with the Google engineers who make the algorithm determine what the “proxy metric” value of the page is. This then determines the page rankings.

Rand says ignore the proxy metrics and aim directly for true quality. I can understand this stance because the Google engineers aren’t stupid so gaming proxy metrics can have a short lifespan.

But there is still a causal relationship between true quality and rankings, it is just mediated by proxy metrics. If Rand disagrees with this definition of causation and says that only proxy metrics cause rankings then this is our point of disagreement. And given this definition of causation I agree with what he is saying.

4 Determining Causal Relationships

The gold standard for determining a causal relationship is the randomised controlled trial. Doing these tends to be a bit of a joke in the SEO industry (happy to be directed to examples that prove otherwise) but there are other, lower, standards of proof that can still be useful.

There are ways that causal relationships can be estimated from raw data with little or no human input (e.g. this chart with accompanying paper) but I don’t know anyone doing this in a marketing context. I’ve tried on the Moz ranking factors data but I haven’t had much luck getting anything useful out of it (I think this is because assumptions of normally distributed data are not at all reasonable in this domain).

Either way, inferring a causal relationship between ranking and some other manipulable factor is bloody hard work. And perhaps this is a reason why Rand suggests focussing on correlations instead - with the data Moz has these are much easier to discover.

But this doesn’t mean that causative relationships don’t exist and it doesn’t negate my first point:

Businesses need to find factors that they can manipulate that cause a change in business outcomes - me (earlier in this post)