Analytics Strategist

March 30, 2013

An Unusually Blunt Dialogue on Attribution – Part 3

Filed under: misc — Tags: , , , — Huayin Wang @ 4:53 pm
Q: It’s been over a week since we talked last time, and I am still in disbelief.  If what you said are true, multi-touch attribution problem is solved! Then again, I feel there are still so many holes. Before we discussion some challenging questions, can you sketch out the attribution process, the steps you proposed.

A: Sure.  There are four steps:

Step 1. Developing conversion model(s)
Step 2. Calculating conditional probability profile for each conversion event
Step 3. Applying Shapley Value formula to get the S-value set
Step 4. Calculating fractional credit: dividing S-value by the (non-conditional) conversion probability

Q: And what to call this – the attribution process? algorithm? framework? approach?
A: attribution agenda – of course, you can call it anything you like.

Q: Why don’t you start with data collection – I am sure you heard about GIGA principles and how important having good and correct data is for attribution …
A: I am squarely focusing on attribution logic – data issues are outside the scope of this conversation

Q: I noticed that there are no rule-based attribution models in your agenda. Are the rules really so arbitrary that they are of no use at all?
A: They are not arbitrary – like any social/cognitive rules of custom nature. For attribution purpose, however, they are neither conversion models, which measure how channels actually impact conversion probability, and nor clearly stated justification principle.

Q: What about the famous Introducer, Influencer and Closer framework – the thing everyone use in defining attribution models – and the insights they provided to attribution?
A: They are really of the same concept as the last touch, first touch rules – a position based way of looking at how channel and touch point sequence are correlated. You can use an alternative set of cleaner and more direct metrics to get similar insights – metrics derived from counting the proportions of a channel in conversion sequence as first touch, last touch and neither.

Q: Do these rules have no use at all in attribution process? Can they be used in conjunction with conversion models?
A: You do not use them together, there is simply no needs for them anymore when you can have conversion models.  However, there are cases when you do not have sufficient data to build your models. In that case, you can borrow from other models, or use these rule-based models as your heuristic rules.

Q: You are clearly not in the “guru camp” – as you said in your “Guru vs PhD” tweet. Are you in the PhD camp then?
A: No. I also think that they maybe more disappointed than the gurus from the web analytics side ..

Q: I have same feeling – I think you are killing their hope of being creative in the attribution modeling area. With your agenda, there is no more attribution models aside from conversion models, and no more attribution modeling aside from this one Shapley Value formula, and the adjustment factor.
A: The real creativity should be in the development of better conversion model.

Q: Let’s slow down a little bit. I think you maybe over simplifying the attribution problem.  Your conversion models seem only work when there is one touch event per channel — how can you handle multiple touch events per channel cases?
A: You may be confusing the conditional probability profile – in which channel is treated as one single entity – with conversion models. In my mind, you can creative multiple variables per channels that reflect complex feature of the touch point sequences for that channel: freq, recency, interval, first indicator etc.. Once the model is developed, you construct the conditional probability profile by taking all the touch points for that channel On or Off at the same time.

Q: Ok. How do you deal with the ordering effect – the fact that channel A first, and B second (A,B) is different from (B,A)?
A: You construct explicit order indicator variables in your conversion models … that way, your attribution formula (the Shapley Value) can remain the same.

Q: And what if the order does not matter.
A: Then the order indicator variables will not be significant in the conversion models.

Q: and the channel interaction?
A: through the usual way you model the interaction effects between two or more main effects.

Q: The separation of conversion model and attribution principle in your agenda is quite frustrating.  Why can’t we find innovative ways of handling both in one model – a sort of magic model, Bayesian, Markovian or whatever.
A: Go find it out.

Q: Control/Experiment could be an alternative, isn’t it?
A: Control/Experiment is at better a way of measuring the marginal impact; it is albeit to say that it is an impractical way to measure all levels of marginal impact that a conversion model will support.  If we have more than a couple of channels, the number of experiment needed goes up exponentially.  It also does not allow post-experiment analysis, and there is no way to practically incorporate recency and sequence patterns etc..

Q: What about optimization principle?  If by requiring the best attribution rule as reflecting the optimal way of allocating campaign budget to maximize the number of conversion, one can derive a unique attribution rule, can that be the solution to attribution?
A: No. Attribution problem is about events that happened already and needs to be answered that way, without requiring any assumption about future.  Campaign optimization is a related, but separate topic.

Q: Your attribution agenda is limited to conversion event. In reality, a lot of other metrics we care about, such as customer life time value, engagement value etc… How do you attribute those metrics?
A: If you can attribution (conversion) event, you can attribute all metrics derived from that, by figuring out what values linked to that event. In short, you figure out the fractional credit for the event first, then multiple the value of the event, you get the attribution process for that new metric.

Q: You have so far not talked about media cost at all – when we know every attribution vendors are using them in the process. How come there is no media cost in your attribution agenda?

A: Media cost is needed to evaluate cost-based channel performance, not for attribution. How much has a channel impacted a conversion is a fact, not depended on how much you paid the vendor —  if there is any relationship, it should be the opposite. The core of attribution process can be done without the media cost data — all vendors ask for it because they want to work on more projects aside from attribution.

Q: Regarding to issue of where should attribution process reside, you picked Agency.  Isn’t agency the last place you’d think of when it comes to any technology matter? Since when did you see agency put technology at their core competency?
A: Understandable. I said that not for any of the reasons you mentioned, but for what an ideal world should be.  Attribution process is so central to campaign planning, execution and performance reporting, at both tactical and strategic level.  Having that piece sitting outside of the integration center can cause a lot of frictions to moving your advertising/marketing to the next level.  I said that it should live inside your agency, but I did not say that it should be “build” by the agency; I did not say it should live inside your “current” agency; and certainly, there is nothing prevent you from making your technology vendor into your “new agency”, as long as they will take up the planning, execution and reporting works from your agency, at both strategic and tactical levels.

Q: What about Media Mix Modeling? If we have resources doing that, do we still need to worry about attribution?
A: The micro-macro attribution technologies. It is complicated and certainly need a separate discussion in order to do justice to the topic.  The simplest distinction between the are this:  when you know the most detailed data of who were touched by what campaigns, you do attribution.  If you have none of those data, but only know the aggregated level of media delivery and conversion data, you do MMM.

Q: I have to say that your agenda brings a lot of clarity to the state of attribution. I like the prospect of order; still, I can’t help but think about what a great time everyone have had around attribution models in recent year ..

A: Yes – the state of extreme democracy without consensus. To those who have gun, money and power, anarchy may just be the perfect state; not being cynical, just my glass half-full kind of perspective.

March 21, 2013

An Unusually Blunt Dialogue on Attribution – Part 2

Q: Continue on our yesterday’s conversation … I am still confused about the difference between conversion model and attribution model and attribution modeling.  Can you demonstrate using a simple example?

A: Sure.  Let’s look at a campaign with one vendor/channel on the media plan …

Q: Wait a minute, that will not be an attribution problem.  If there is only one channel/vendor,  does it matter what attribution model you use?

A: It does. Do we give the vendor 100% of the credit? A fraction less than 100% of the credit?

Q: Why not 100%?  I think all commonly used attribution models will use 100% …

A: You may want to think twice, because some users may convert on their own.  Let’s assume the vendor reach 10,000 users and 100 of them converted. Let’s also assume that, through analysis and modeling works (such as using a control group), you conclude that 80 out of the 100 converters will convert on their own.  How many converters does the vendor actually (incrementally) impacted?

Q: 20.

A: If you assign 100% credit to the vendor, the vendor will get all 100 converters’ credits.  Since the actual impacted conversion is 20, a fraction of credit should be used; in this case it is 20% instead 100%.  That’s attribution modeling, in its simplest form.

Q: Really? Can you recap the process and highlight the attribution modeling part of it?

A:  Sure. In this simplest example, the conversion model provides us two numbers(scores):

1)      The probability of conversion given the converter exposed to the campaign, call it P(c|camp) – in this case it is 100/10000 = 1% , and

2)      The probability of conversion given the converter not exposure to the campaign, call it P(c|no-camp) – in this case it is 80/10000 = 0.8%

The attribution modeling says that, only a fraction of the credit, (P(c|camp)-P(c|no-camp))/P(c|camp) == 0.2 or 20%, should be credited out.

Notice that this fraction for attribution is not 100%. It is not P(c|camp) which is 1%; and it is not P(c|camp) – P(c|no-camp) which is 0.2%.

Q: This is an interesting formula.  I do not recall seeing it anywhere before.  Does this formula come from the conversion model?

A: Not really.  The conversion model only providing the best possible estimate for P(c|camp) and P(c|no-camp), that’s all.  It will not provide the attribution fraction formula.

Q: Where does this formula come from then?

A: It comes from the following reasoning:  vendor(s) should get paid for what they actually (incrementally) impacted, not all the conversions they touched.

Q: So the principle of this “attribution modeling” is not data-driven but pure reason.  How much should I trust this reasoning?  Can this be the ground to build industry consensus?

A: What else can we build consensus on?

Q: Ok, I see how it works in this simple case, and I see the principle of it.  Can we generalize this “incremental impact” principle to multi-channel cases?

A: What do you have in mind?

Q: Let me try to work out the formula myself.  Suppose we have two channels, call them A, and B.  We start with conversion model(s), as usual.  From the conversion model(s), we find out our best estimates for P(c|A,B), P(c|nA,nB), P(c|nA,B), P(c|A,nB).  Now I understand why it does not matter if we use logistic regression, or probit model or neural network to build our conversion model – all that matter is to make sure we get the best estimates for the above scores J

A: Agree.  By the way, I think I understand the symbols you used, such as c, A, nA, nB etc. – let me know if you think I may guess it wrong 🙂

Q: This is interesting, I think I can get the formula now.  Take channel A first, and let’s call the fractional credit A should get as C_a;  we can calculate it with this formula:  C_a= (P(c|A,B)–P(c|nA,B)) / P(c|A,B), right?

A: If you do that, C_a + C_b maybe over 100%

Q: What’s wrong, then?

A: We need to first figure out what fraction of attribution available to be credited out to A and B, just as in the simplest case discussed before. It should be (P(c|A,B) – P(c|nA,nB)) / P(c|A,B).

Q: I see.  How should we divide the credit to A and B next?

A: That is a question we have not discussed yet.  In the simplest case, with one vendor, this is a trivial question. With more than one vendor(s)/channel(s), we need some new principle?

Q: I have an idea:  we can re-adjust the fractions on top of what we did before, like this:  C’_a = C_a / (C_a + C_b) and C’_b = C_b/(C_a + C_b);  and finally, we use C’_a and C’_b to partition the above fraction of credit.  Will that work?

(note: the following example has error in it, as pointed out by Vadim in his comment below)

A: Unfortunately, no.  Take the following example:

suppose A add no incremental value, except when B is present:  P(c|A,nB) == P(c|nA,nB) and P(c|A,B) > P(c|nA,B)

also, B does not add anything when A is present:  P(c|A,B) = P(c|A,nB)

The calculation will lead to:  C_b == 0 and C_a > 0.  Therefore, A get all the available credit and B get nothing.

Do you see a problem?

Q: Yes.  B will feel unfair, because without B, A will contribute nothing.  However, A get all the credit and B get nothing.

A: This is just a case with two channels and two players.  Imaging if we get 10 channels/players, what a complicated bargaining game this is going to be!

Q: Compare with this, the conversion model part is actually easy; well, not easy but more like a non-issue.  We can build conversion models to generate all these conditional probability scores.  However, we still stuck here and can’t figuring out a fair division of credit.
A: This is attribution modeling:  the process or formula that will translate the output of conversion models into attribution model (or fractional credits). We need to figure this thing out.

Q: What is it, really?

A: We are essentially looking for a rule or a formula to divide the total credit that we can all agree as fair.  Is that right?

Q: Right, but we have to be specific about what do we mean by “fair”.

A:  That’s right.  So, let’s discuss a minimal set of “fair” principles that we can all agree upon.  There are three of them, as I see it:

Efficiency: we are distributing all available credit, not leaving any on the table

Symmetry: if two channels are functionally identical, they should get the same credit

Dummy Channel: if a channel contribute nothing in all cases, it should get no credit

What do you think?

Q: I think we can agree with these principles.  How can they help?

A: Well, someone has proved that there is one and only one formula that satisfy this minimal set of principles. I think this is our attribution formula!

Q: Really? I do not believe this.  Who proved this?  Where can I read more of it?

A: In 1953, Lloyd Shapley published the proof in his PhD dissertation and the resulting formula became Shapley Value. The field of knowledge is called Cooperative Game Theory.  You can Google it and you will find tons of good references. Of course, Shapley did not call it “attribution problem” and he talked about players instead of channels. The collection of principles are more than three.  However, Transferable Utility and Additive principle are automatically satisfied when applied to credit partitioning problem.

Q: Now, how do you apply this attribution rule differently for different converters?

A: You do not.  The difference among converters are reflected in the scores generated from the conversion models, not in the above attribution formula – or Shapley Value.

Q: Ok, if that is the case, everyone in the industry will be using the same Attribution Formula, or Shapley Value.  How do we then creatively differentiate from each other?  How should different type of campaigns be treated uniquely?  How do the effect of channels on different types of conversion be reflected in attributed credits?

A: Well, all these will be reflected in how the conversion models are built and how the parameters of the conversion models are estimated, and finally the scores that come out of the conversion models.  You will innovate on statistical model development techniques. Attribution formula is, fortunately, not where you are going to innovate.

Q: This is quite shocking to me. I can’t imagine how the industry will react …

A: How did industry deal with Marketing Mix Modeling?  We accept the fact that those are simply regression models in essence, and start selling expertise on being able to do it thoroughly and do it right.  We do not have to create our own attribution model to be able to compete with each other.

March 20, 2013

An Unusually Blunt Dialogue on Attribution – Part 1

Filed under: misc — Tags: , , — Huayin Wang @ 10:04 pm

Q:  I will begin with this question, what you do NOT want to talk about today?

A:  I do not want to waste time on things that most people know and agree with, such as “Last Touch Attribution is flawed”

Q: Why is attribution model such a difficult challenge, that after many years we still seem to just begin scratching the surface of it?

A: No idea.

Q: Let me try a different way, why is it so hard to build an attribution model?

A: It is not.  It is NOT difficult to build an attribution model – in fact, you can build 5 of them in less than a min:  Last Touch, First Touch etc… J  It is difficult to build good attribution modeling – a process that produce methodologically sound attribution model.

Q: “Attribution modeling” – is this the kind of tool already available through Google Analytics.

A: No. Those are attribution model specification tools – “you specify the kind of attribution models to your heart’s content and I do reporting using them”.  They do not tell you what IS the RIGHT attribution model. An attribution reporting tool does not make an attribution modeling tool.

Q: “Methodologically sound” – that seems to be at the heart of all attribution debates these days.  Do you think we will ever reach a consensus on this?

A:  Without a consensus on this, how can anyone sell an attribution product or service?

Q: On the other hand, isn’t “algorithmic attribution” already a consensus, that everyone can build on it?

A: What is that thing?

Q: All vendors seem to take the “algorithmic attribution” approach, possibly adding additional phrases, such as “statistical models” and data-driven etc.  Isn’t that sufficient?

A: How? They never show how it works.

Q: Do you really need to get into that level of detail, the “Black Box” – the proprietary algorithm that people legitimately do not release to the public?

A: There is no reason to believe that anyone has a “proprietary algorithm” for attribution.  Unlike predictive modeling, a domain of technology that can be “externally” evaluated without going inside the Black Box,  attribution modeling is like math, a methodology whose validity needs to be internally justified. A Black Box for attribution sounds like an oxymoron for me.  You do not see people claim that they have a “proprietary proof” of Fermat’s Last Theorem.   (Ironically, Fermat himself claimed the proof on the margin of a book without actually showing it, but everyone knows he never intended it to be like that).

Q: Why then do people claim to have but do not show their algorithmic and/or modeling approach?

A: It is anyone’s guess.  I see no reason for that;  it hurts themselves and it hurts the advertising industry, particularly online advertising industry.  I suggest, from today, every vendor should either stop claiming that they have proprietary attribution modeling/model or get out of the “Black Box” (the new empire’s cloth?) and prove the legitimacy of their claim.

Q: Ok, suppose I say, I build a regression model to quantify which channels impact conversion and by how much, then calculate the proportional weights based on that and partition the credits according to the proportions.  What would you say?

A: How?

Q: You are not serious, right?  I am giving you so much details – how much more do you want?

A: The program and process sounds like it will work, and it is quite CLEAR that it is going to work to non-practitioners’ eyes.  But you know  and I know that it does NOT work.  Having built conversion models does not solve the attribution problem.  Attribution problem comes down to the partitioning of credit, i.e. how much of the conversion credit to be partitioned and how much given to each channels.  The logic has to be explicitly presented and justified.   The core challenge has been glossed over and covered up, but not solved.

Q: Please simply it for me.

A: There is no automatic translation available from conversion models to attribution models – the process of doing that, which is attribution modeling has to be explicitly stated.

Q: You defined attribution problem as partitioning credit to channels – are you talking about only Cross-Channel Attribution?  If I want to focus only on Digital Attribution, or even Publisher Attribution only, is what you said still relevant?

A: Yes.  I am talking about it from data analytics angle – you can just replace the word “channel” with others and the rest will apply.

Q: Ok, what if the conversion model I use is not regression, but some kind of Bayesian models?

A: It does not matter.  It can be Bayesian, Neural Net or a Hidden Markov Model.  As long as it is a conversion model.  The automatic translation is not there.

Q: Does it matter if the conversion model is predictive or descriptive?

A: It should be a conversion model – there are multiple meanings of “predictive model”;  it is essentially predictive models, but need not  handle “information leaking” type of issues as a predictive model should.

Q: Does it need to be “causal” model, and not a “correlational” model?

A: Define causal for me.  Specifically, do people know what they mean by “correlational” model?  Do they know multivariate models and dependence concepts?

Q: I assume we know.  Causal vs. correlational are just common sense concepts to help us make the discussion around “model” more precise …

A: But neither are more precise concepts than statistical modeling language.  Even philosophers themselves begin to use statistical modeling language to clarifying their “causal” framework …

Q: Now I am confused.  Where are we right now?

A: We are discussing statistical models and attribution modeling …

Q: Ok, should we use statistical models when we do attribution?

A: We have to.  Quantifying the impact of certain actions on conversion should be the foundation for any valid attribution process;  there are no more precise ways to do that than developing solid statistical models for conversion behavior!

Q: Not even experimental design?

A: Not even that.

Q: But what is the right statistical model?  Some types of regression models or some Bayesian models or Markovian models?

A: It does not have to be any one of them, and yet, any one of them may do the job.

Q: If that is true, how can one justify the objectivity of the model?

A: A conversion model provides the basis for what reality looks like – to our best knowledge at the moment.  There can be different types of statistical methodologies to model the conversion behavior, and that does not create problems with the objectivity of the model output.  We have seen this in marketing response models,  where the modelers have the freedom to choose whatever methodology (type of models) they deem appropriate and yet it does not compromise the objectivity of its results.

Q: But attribution is different;  when building marketing response models, what is important is the score, not the coefficients or any “form factors” of the model.  In attribution, those form factors are central, and not scores, to derive the attribution formula.

A: That’s exactly the problem that needs to be corrected. Attribution formula should NOT be built on the “form factors” of the conversion model, but rather on the scores of the conversion models!

Q: Explain more …

A: If you can’t claim that linear regression model IS the only right model for conversion behavior, you can’t claim those regression coefficients, the “form factors” of the regression models, are intrinsic to the conversion behavior.  Thus, any attribution formula built on top of that cannot be justified.

Q: And the conclusion, in simpler language …

A: Conversion model is needed for attribution, but attribution model is not the conversion model.  Attribution model should be built on top of the “essence” part of the conversion models, i.e. the scores, and not the form factors. Attribution modeling is the process of translating conversion modeling results to attribution model.

Q:  What is that saying about the offering from current vendors?

A:  They often tell us that they build conversion models, but reveals nothing about their attribution modeling methodology.

Q: What if they say that, they are hiding their proprietary attribution technology in Black Box?  Are they just covering up the fact that they have nothing in there, and they do not know how?

A: Anyone’s guess.  The bottom line is, anyone claiming anything should acknowledge the right to doubt from their audience.

Q: It is common to see companies hiding their predictive modeling (or recommendation engine technology) in Black Box … why not attribution?

A: Predictive modeling, or even recommendation modeling, are things that can be externally tested and verified.  You can put two predictive model scores, and test out which one has more predictive power without knowing how they build the models.  Attribution modeling is different;  you have to make explicit how and why your way of allocation is justified – otherwise, I have no way of verifying and validating your claim.

Q: We are not in the faith business …

A: Amen.

Q: Ok, big deal.  I am an advertiser, what should I do?

A: Demand anyone who is selling you attribution products/services, to show you their attribution stuff.  It is ok if they hide the conversion model part of it, but do not compromise on the attribution modeling.

Q: I am in the vendor business, what should I do?

A: Defend yourself – not by working on defensive rhetoric, but by building and presenting your attribution modeling openly.

Q: If I am an agency, what should I do?

A: Attribution should live inside the agency. You can own, or rent it;  you should not be fooled by those who like to make you think attribution modeling is a proprietary technology –  it is not.  Granted that you are not a technology company, but attribution modeling is not a proprietary technology.  If you have people who can build conversion model, you are right up there with those “proprietary” attribution vendors.

Q: If attribution modeling becomes an “Open” methodology, what about those attribution vendors?  What they will own and why advertisers and agencies wouldn’t build themselves?

A: That’s my question too J

Q: Are vendors going to be out of business?

A: Well, they can still own the conversion modeling part of it … and there are still predictive modeling shops out there, in business …

Q: Somehow, you sound like you know something about this “open secret” already J  Can you share a little on that?

A: Can we talk tomorrow? I need to leave for this “Attribution Revolution” conference tonight …

March 6, 2013

The difficult problems in attribution modeling

Filed under: misc — Tags: — Huayin Wang @ 10:58 pm

The term “attribution modeling” can have different meanings to different people – sometime being used interchangeably with “attribution model”. To me, attribution model refers to things like “last touch”, “first touch” etc. – rules that specify how attribution should be done. Attribution modeling is about the process where attribution model is generated.  Attribution Modeling give us the model generation process, as well as the reasons and justification of the attribution model being derived.

It is not difficult to come up with an attribution model, in fact, we can make up one in seconds. What difficult is to determine which one is the right attribution model. Despite all the discussion and progress made over the last few years, there is no consensus about it. And the lack of industry consensus really hurt.

The question about right attribution model is perhaps miss-guided; for we all know that a model could be right for one business, say e-commerce, may be wrong for another, such as B2B. What is right may also depend on type of campaigns, type of conversions and even type of users (male vs female, adult vs teens).  The right question should be: what’s the right attribution modeling – the right process of how an attribution model is generated.

Each one of us can easily list 4 or 5 most commonly used attribution models. What about attribution modeling? How many different processes can attribution model be produced?

Last Click/Last Touch attribution models are examples where intuition is the modeling process.  It is not data driven.  You can argue about the good and bad conceptually.  On the other hand, data-driven approach holds the fundamental belief that the right attribution model should be derived from data.  Within data-driven approach, there are two slightly differing approaches: experimental design vs algorithmic attribution.  

You may ask, what about Google’s Attribution Modeling Tool in Google Analytics? It is not really an Attribution Modeling Tool in my use of the word, it helps you specifying attribution models, not creating any data-driven models. It does not tell you how to derive the “right” attribution model.

The data-driven approach is what we will focus here. There has been great progress in the “algorithmic attribution” approach, and significant business build on this (Adometry and VisualIQ to name a couple).  However, none is clear and transparent enough about their key technologies – as an industry, we left with a lot of confusions.  

The set of difficult problems are about that – the core technology of attribution modeling. We need to answer these questions so we can build upon a common ground and move on.  Here’s a list of the questions/problems:

1) Is attribution modeling the same as statistical conversion modeling?

2) What’s the right type of models to use: predictive modeling, descriptive modeling, causal modeling?

3) Does it matter if the model is linear regression, logistic or some bayesian network model?

stay tuned for more.

Blog at WordPress.com.