One (of Many) Problems With Attribution Rules

Posted on September 5, 2012

Currently attributing revenue between multiple touch points on a user’s journey to purchase involves using some kind of rule to split the credit up based on the order in which touch points occur. Example rules are:

You can also add channel specific rules:

This type of rule doesn’t help with the strawman I’m building here so I’m going to revisit them later.

Now we are going to make some simplifying assumptions about a hypothetical online business. These assumptions will be used to make a model which is easy to reason about and experiment with.

Here are the rules:

So the path to purchase/vanishing for ever looks like this:

 digraph G {
  size="8,6"
  ratio=expand
  display1[label="display"]
  display1->display2
  display1->search2
  display1->social2
  search1[label="search"]
  search1->display2
  search1->search2
  search1->social2
  social1[label="social"]
  social1->display2
  social1->search2
  social1->social2
  display2[label="display"]
  display2->display3
  display2->social3
  display2->search3
  search2[label="search"]
  search2->display3
  search2->search3
  search2->social3
  social2[label="social"]
  social2->display3
  social2->search3
  social2->social3
  display3[label="display"]
  display3->conversion
  search3[label="search"]
  search3->conversion
  search3->vanish
  social3[label="social"]
  social3->conversion
  social3->vanish
}

Another assumption:

Next we put 6000 people in at the top of the funnel (2000 in each channel). What happens next?

At each step a user has a 2/3 chance of not using display so after the third interaction (2/3)3=8/27=30% (approx) will not have clicked a display advert. By our assumption that a user converts if and only if they interact with display we get 4200 conversions in total.

There are 27 different ways of interacting with the three channels over three visits.

Let’s apply a few attribution models to these results and see what they tell us:

All these models are poor because they attribute credit to channels that don’t deserve it. The correct model in this case is to attribute 100% of the credit to display.

The question is, how do we know that this is the right thing to do without knowing what the model is? How can we see that display is the only channel causing conversions just by looking at the user path data?

And don’t forget, that this is an easy example; anything that touches display converts. The problem gets a lot harder when things are less black and white.

1 Further things to consider

I said right at the start that I was constructing a strawman argument and I think I’ve been quite successful at this; I’ve made my model in such a way that time based or touch order based attribution models won’t get the desired result.

You need to decide if the reason these models fail is realistic or if it is just a product of my unrealistic assumptions.