Q: Continue on our yesterday’s conversation … I am still confused about the difference between conversion model and attribution model and attribution modeling. Can you demonstrate using a simple example?

A: Sure. Let’s look at a campaign with one vendor/channel on the media plan …

Q: Wait a minute, that will not be an attribution problem. If there is only one channel/vendor, does it matter what attribution model you use?

A: It does. Do we give the vendor 100% of the credit? A fraction less than 100% of the credit?

Q: Why not 100%? I think all commonly used attribution models will use 100% …

A: You may want to think twice, because some users may convert on their own. Let’s assume the vendor reach 10,000 users and 100 of them converted. Let’s also assume that, through analysis and modeling works (such as using a control group), you conclude that 80 out of the 100 converters will convert on their own. How many converters does the vendor actually (incrementally) impacted?

Q: 20.

A: If you assign 100% credit to the vendor, the vendor will get all 100 converters’ credits. Since the actual impacted conversion is 20, a fraction of credit should be used; in this case it is 20% instead 100%. That’s attribution modeling, in its simplest form.

Q: Really? Can you recap the process and highlight the attribution modeling part of it?

A: Sure. In this simplest example, the conversion model provides us two numbers(scores):

1) The probability of conversion given the converter exposed to the campaign, call it P(c|camp) – in this case it is 100/10000 = 1% , and

2) The probability of conversion given the converter not exposure to the campaign, call it P(c|no-camp) – in this case it is 80/10000 = 0.8%

The attribution modeling says that, only a fraction of the credit, (P(c|camp)-P(c|no-camp))/P(c|camp) == 0.2 or 20%, should be credited out.

Notice that this fraction for attribution is not 100%. It is not P(c|camp) which is 1%; and it is not P(c|camp) – P(c|no-camp) which is 0.2%.

Q: This is an interesting formula. I do not recall seeing it anywhere before. Does this formula come from the conversion model?

A: Not really. The conversion model only providing the best possible estimate for P(c|camp) and P(c|no-camp), that’s all. It will not provide the attribution fraction formula.

Q: Where does this formula come from then?

A: It comes from the following reasoning: vendor(s) should get paid for what they actually (incrementally) impacted, not all the conversions they touched.

Q: So the principle of this “attribution modeling” is not data-driven but pure reason. How much should I trust this reasoning? Can this be the ground to build industry consensus?

A: What else can we build consensus on?

Q: Ok, I see how it works in this simple case, and I see the principle of it. Can we generalize this “incremental impact” principle to multi-channel cases?

A: What do you have in mind?

Q: Let me try to work out the formula myself. Suppose we have two channels, call them A, and B. We start with conversion model(s), as usual. From the conversion model(s), we find out our best estimates for P(c|A,B), P(c|nA,nB), P(c|nA,B), P(c|A,nB). Now I understand why it does not matter if we use logistic regression, or probit model or neural network to build our conversion model – all that matter is to make sure we get the best estimates for the above scores J

A: Agree. By the way, I think I understand the symbols you used, such as c, A, nA, nB etc. – let me know if you think I may guess it wrong :)

Q: This is interesting, I think I can get the formula now. Take channel A first, and let’s call the fractional credit A should get as C_a; we can calculate it with this formula: C_a= (P(c|A,B)–P(c|nA,B)) / P(c|A,B), right?

A: If you do that, C_a + C_b maybe over 100%

Q: What’s wrong, then?

A: We need to first figure out what fraction of attribution available to be credited out to A and B, just as in the simplest case discussed before. It should be (P(c|A,B) – P(c|nA,nB)) / P(c|A,B).

Q: I see. How should we divide the credit to A and B next?

A: That is a question we have not discussed yet. In the simplest case, with one vendor, this is a trivial question. With more than one vendor(s)/channel(s), we need some new principle?

Q: I have an idea: we can re-adjust the fractions on top of what we did before, like this: C’_a = C_a / (C_a + C_b) and C’_b = C_b/(C_a + C_b); and finally, we use C’_a and C’_b to partition the above fraction of credit. Will that work?

(note: the following example has error in it, as pointed out by Vadim in his comment below)

~~A: ~~*Unfortunately, no. Take the following example:*

*suppose A add no incremental value, except when B is present: P(c|A,nB) == P(c|nA,nB) and P(c|A,B) > P(c|nA,B)*

*also, B does not add anything when A is present: P(c|A,B) = P(c|A,nB)*

*The calculation will lead to: C_b == 0 and C_a > 0. Therefore, A get all the available credit and B get nothing.*

*Do you see a problem?*

*Q: Yes. B will feel unfair, because without B, A will contribute nothing. However, A get all the credit and B get nothing.*

A: This is just a case with two channels and two players. Imaging if we get 10 channels/players, what a complicated bargaining game this is going to be!

Q: Compare with this, the conversion model part is actually easy; well, not easy but more like a non-issue. We can build conversion models to generate all these conditional probability scores. However, we still stuck here and can’t figuring out a fair division of credit.

A: This is attribution modeling: the process or formula that will translate the output of conversion models into attribution model (or fractional credits). We need to figure this thing out.

Q: What is it, really?

A: We are essentially looking for a rule or a formula to divide the total credit that we can all agree as fair. Is that right?

Q: Right, but we have to be specific about what do we mean by “fair”.

A: That’s right. So, let’s discuss a minimal set of “fair” principles that we can all agree upon. There are three of them, as I see it:

Efficiency: we are distributing all available credit, not leaving any on the table

Symmetry: if two channels are functionally identical, they should get the same credit

Dummy Channel: if a channel contribute nothing in all cases, it should get no credit

What do you think?

Q: I think we can agree with these principles. How can they help?

A: Well, someone has proved that there is one and only one formula that satisfy this minimal set of principles. I think this is our attribution formula!

Q: Really? I do not believe this. Who proved this? Where can I read more of it?

A: In 1953, Lloyd Shapley published the proof in his PhD dissertation and the resulting formula became Shapley Value. The field of knowledge is called Cooperative Game Theory. You can Google it and you will find tons of good references. Of course, Shapley did not call it “attribution problem” and he talked about players instead of channels. The collection of principles are more than three. However, Transferable Utility and Additive principle are automatically satisfied when applied to credit partitioning problem.

Q: Now, how do you apply this attribution rule differently for different converters?

A: You do not. The difference among converters are reflected in the scores generated from the conversion models, not in the above attribution formula – or Shapley Value.

Q: Ok, if that is the case, everyone in the industry will be using the same Attribution Formula, or Shapley Value. How do we then creatively differentiate from each other? How should different type of campaigns be treated uniquely? How do the effect of channels on different types of conversion be reflected in attributed credits?

A: Well, all these will be reflected in how the conversion models are built and how the parameters of the conversion models are estimated, and finally the scores that come out of the conversion models. You will innovate on statistical model development techniques. Attribution formula is, fortunately, not where you are going to innovate.

Q: This is quite shocking to me. I can’t imagine how the industry will react …

A: How did industry deal with Marketing Mix Modeling? We accept the fact that those are simply regression models in essence, and start selling expertise on being able to do it thoroughly and do it right. We do not have to create our own attribution model to be able to compete with each other.

Hi Huayin. Interesting reading. Before getting into the deep issues, could you please clarify why in the example above you state that C_b==0? Where does it follow from? Also, what do you mean by nA (or nB): do you refer to the users that have not been exposed to A (B), or a counterfactual (where treatment by A (B) is replaced by non-campaign treatment, say PSA)? There could be big difference: P(c|A,nB) == P(c|nA,nB) implies that campaign has no effect if entire treatment by B is replaced with nB (PSA). This in turn says that if the entire campaign has incremental value, it does come (in part) from treatment by B, correctly calculated causal C_b cannot be zero.

In fact, there is clear contradiction in the example:

If, as you state, P(c|A,nB) == P(c|nA,nB) and B does not add anything when A is present: P(c|A,B) = P(c|A,nB), then P(c|A,B) = P(c|nA,nB), and the entire campaign has no incremental value.

The conclusion is not that a simple attribution formula should work but that if treated properly, causal-type “conversion model” should not produce obvious problems. How to properly distribute credit is a different matter.

Comment by Vadim Tsemekhman — April 12, 2013 @ 1:21 am

Vadim, thank you for stopping by!

The notations nA is understood as counterfactual. You are correct in pointing out the error I made in the example, really appreciate that! I apologize.

I think if I used the following assumption: P(c|A,B) == P(c|nA,B) == P(c|A,nB), I can hopefully still point out the problem, when we will have C_a == C_b == 0. Also, if I assume P(c|nA,B) == 0.99 * P(c|A,B) and P(c|A,B) == P(c|A,nB), then the simple crediting formula will lead to A getting 100% of credit, suggesting a bit of unfairness to B.

Thanks again!

Comment by Huayin Wang — April 12, 2013 @ 3:23 am

Vadim, there is a bigger issue than what this example covers. I will discuss it in a separate post.

Comment by Huayin Wang — April 19, 2013 @ 2:13 pm

Huayin, thank you for clarification. I would love to chat with you – we have come up with some very interesting things. Re: your example, I think it is impossible to get around the causality.

If you imagine a standard A/B testing experiment, P(c|A,B) == P(c|A,nB) implies that there is no change in the conversion probability when you replace ads run by pub B with the PSA’s. The same can be told in the language of counterfactuals. This in turn by definition means that the incremental value of publisher B is zero, and there nothing unfair in not giving this pub any credit. Please let me know whether you agree and how we could talk. I really appreciate you writing this blog.

Comment by Vadim Tsemekhman — April 16, 2013 @ 7:36 am

Eager to read your new post.

Comment by Vadim Tsemekhman — April 22, 2013 @ 10:44 pm