Blog»2012-10-01-generating-path-data-in-R

2012-10-01-generating-path-data-in-R

Table of Contents

--- title: Generating user touchpoint data in R tags: R,attribution ---

Right now I'm thinking quite a lot about multitouch marketing (not going to use the word attribution for now). I have a few ideas that I want to explore around modelling user behaviour. Validating models in this area is pretty tricky, so I'm going to play around with some toy models first and see how far I get.

The advantage of using a toy model is that I know the underlying structure so I'll know if I'm getting it right or not. In real life there are fewer guarantees that the answer I'm getting will be correct.

Let's start off with a really simple model.

1 Simple model

There are three channels A, B and C. The following rules apply:

  1. Whenever a user touches channel C they convert.
  2. A user who touches channel A has a 50% chance of next appearing through channel C and a 50% chance of next appearing through channel B.
  3. A user who touches channel B has a 50% chance of next appearing through channel C and a 50% chance of never visiting the site again.

Graphically, this looks like this:

 digraph G {
  size="8,6"
  ratio=expand
  A->C[label="50%"]
  A->B[label="50%"]
  B->C[label="50%"]
  B->vanish[label="50%"]
  C->convert[label="100%"]
}

Now I want to use this model to generate a lot of user paths of the form A->B->vanish, B->C->convert, C->convert etc.

I have the following R code which can simulate this data:

nexttouch <- function(current){
  if (current=="C") {
    return("convert")
  } else if (current=="A") {
    if (runif(1)<0.5) return("C") else return("B")
  } else {
    if (runif(1)<0.5) return("C") else return("vanish")
  }
}

userpath <- function(start){
 current=start
 path=c(current)
 while (current!="vanish" && current!="convert") {
    current<-nexttouch(current)
    path<-c(path,current)
    }
 return(path)
}

Now lets generate some data by putting 1000 users into each of the three channels:

listC <- lapply(list("C"),rep,1000)
listB <- lapply(list("B"),rep,1000)
listA <- lapply(list("A"),rep,1000)
paths<-lapply(c(listB[[1]],listA[[1]],listC[[1]]),userpath)

The maximum length path is A->B->C->convert so I know that there are at most three touches.

For each channel say if it is included in the path

containschannel <- function(channel,path) {
   return(is.element(channel,path))
}

includes <- data.frame(C=unlist(lapply(paths,function(x) containschannel("C",x)))
                      ,A=unlist(lapply(paths,function(x) containschannel("A",x)))
                      ,B=unlist(lapply(paths,function(x) containschannel("B",x)))
                      ,convert=unlist(lapply(paths,function(x) containschannel("convert",x)))
                      )

We can use the includes data frame to calculate conditional probabilities.

conditional <- function(channel,condition){
   c <- sum(includes$channel==TRUE)
   i <- sum(includes$channel==TRUE && condition==TRUE)
   return(i/c)
}
Authored by Richard Fergie