Some of my old writing on Linkedin, published a year and half ago. Below is a list of links:

The right attribution framework – part 1

The right attribution framework – part 2

The right attribution framework – part 3

Some of my old writing on Linkedin, published a year and half ago. Below is a list of links:

The right attribution framework – part 1

The right attribution framework – part 2

The right attribution framework – part 3

Ad Tech is an exciting industry, rich in complex and multidisciplinary challenges. It can also be quite intimidating and confusing with its maze of jargon and operational black-boxes.** **Bid Optimization Algorithm is often considered one of those black-boxes.

Real Time Bidding (RTB) is at the core of Ad Tech’s operations. At the heart of RTB are the bidders, running their mythical bid optimization algorithms. I have been fascinated with the inner workings of a bidder for many years and learned a few things about optimal bidding along the way. The purpose of this article is to share my thoughts with those who are interested in bid optimization algorithms while applying rigorous thinking but written in plain and simple language. I understand that this is close to mission impossible and I will more than likely fail; I still want to give it a try. So wish me luck and let me know how far I get!

**A Bidder’s Triple Challenges**

Real Time Bidding is a new kind of media buying-selling mechanism – transacting one impression at a time through an auction marketplace; this is quite different from the traditional way of buying/selling media in bulk via contract negotiation. Publishers put ad-impression opportunities in the auction marketplace (i.e. ad exchange), collect bids from bidders who represent buyers (advertisers and their agencies), and the winner, winning ad and winning bid are all selected in real time.

A bidder is really the brain on the buy-side (demand side) of the market. Building a bidder is quite a challenging job – just imagine yourself receiving a million bid requests per second, running them through fraud-detection screening, evaluating the value of each impression opportunity to your clients and making the best bidding decision given the budgets and other client constraint and doing all these a million times every second!

Generally speaking, a bidder has three major challenges: 1) the *technological challenge* to handle the speed and scale (cloud computing, Memcached and NoSQL); 2) the statistical learning challenge to model and estimate the expected value of each impression (click, conversion, test/control and attribution) and 3) the *mathematical* challenge of bid optimization: how to bid optimally given the change and complexity of the marketplace and all the business constraints.

*Bid Optimization ***is the focus of this discussion**

Let’s start with simple case: bidding on a single impression on behalf of one client. For a single bid request, you need to decide on an optimal bid: should it be $1, $100, 2c or no bid?

Upon further thought, you may realize that you have no base for making any decision. This is because the optimization problem is not yet fully defined. We need to specify the context and parameters before we can meaningfully talk about optimal bidding – we need to know what the auction market/rule is. How is cost calculated when we win, how many others are competing in the auction and how do they bid? What’s my goal and constraints? Nothing makes us feel more powerless than struggling with ill-defined problems.

*Vickery Auction*

Auction is a process where buyers compete against each other and a winner is decided by a certain rule(s), usually the highest bid price. In general, there are two classes of auctions: open auction and close auction, based on whether bids are openly communicated to every bidder in the bidding process. Open auction is further divided into English auction, where the bidding starts low and goes up and Dutch auction, where the asking price starts high and goes down.

Ad exchanges use Vickery Auction or variants of it; it is a closed/sealed auction where each bidder bids without knowledge of how others bid. It is also a second price auction: the bidder with the highest bid wins but pays the second highest bid, not its own bid. You may wonder why any seller would agree to get paid by the second highest bid. The (somewhat) surprising reason is that, this auction rule helps the bidders declare their TRUE value (private valuations) of the item; this indirectly may help the seller get a higher bid.

The common understanding is that under Vickery Auction, bidders need not worry about how others bid strategically against them, they can simply bid at their private value of the item. This is a good property in a tool for designing a simple and efficient market mechanism.

Unless said otherwise, Vickery Auction is assumed in the following discussion.

**Bidding a single impression**

Back to our simple case of bidding on a single impression. We now know that, unless we know the value of the item to us, we really do not have a basis to talk about an optimal bid. Once we know our value for the item, under Vickery Auction, we should just bid at that value, regardless of how others bid. This is amazingly simple.

It is important to understand this, so let’s check it out and make sure we are all convinced. Suppose the impression has a value of $100 to us, and the highest bid from others is x (meaning we do not know exactly how much). The claim is that bidding $100 is the best. How?

Suppose we bid lower than $100, say $50. If x is less than $50 or more than $100, the outcome will be identical to the bidding at $100. When x falls between $50 and $100, assuming x is $75, bidding at $100 will win this case and gain $25 ($100 value – $75 cost), whereas bidding at $50 at will lose the bid and miss the opportunity of this $25 gain. Thus the proof.

What if you bid higher than $100, say $130? If x is less than $100 or greater than $130, then the two bids result in the same outcome. When x is in between $100 and $130, bidding at $130 will win resulting in a net cost of ($x-$100). Thus, bidding higher than your value is not a good strategy.

*Optimal bidding is simply bidding at your valuation*

Convinced? Intuitively this is too simplistic; the real world is surely much more complex. Let’s dig into the complexity and see how they may impact our optimal bidding strategy. The main complexity starts when there are more bid requests and we have to decide which one to bid on and how much to bid.

Let’s take the next step and assume we now have two impression requests. How should we bid? A simple approach is to treat the two bids as independent and apply optimal bidding to each impression. Any problem with this?

One problem is the budget. Let’s assume our budget is $100. With two impressions valued at $75 and $100, if we bid at the (independent) optimal bids, $75 and $100 respectively, the following are three possible outcomes:

- We win both, costs are at $60 and $80
- We win both, costs are at $40 and $60
- We lose the first and win the second, cost is $85

With (1) we are over budget and the bids are impermissible. (2) is the perfect scenario. (3) is sub-optimal, with left over budget. Given that all three scenarios are possible, we really do not know what to say about our bidding strategy.

We are in this interesting situation again, with an insufficiently defined optimization problem. We need to add an extra component to setup the problem properly. What is this missing component?

We need to know which one of the above three situations will arise, for each impression. If we know the full set of bids from others, it will be suffice. However, at the time of the bidding, the set of other bids is not known, no matter how much we might want it. What we can know is the probability distribution of other bids. In fact, all we need to know is the *probability distribution of the market bid (the highest other bid, your bid not included)* – this is in fact the missing component, for us to fully specify our budget constrained optimization problem. The distribution can be impression specific; the set of distributions should provide the sufficient information about the bidding environment.

**Optimal bidding with budget constraint **

It is interesting to notice that, when we bid on a single impression with no budget constraint, we do not need to know how others bid. We do not need to know the distribution of the highest other bid; even if we know it, it changes nothing about how we bid.

With more than one impression to bid on, we need to decide how to optimally spread the budget over the set of impression bid requests. Each bid request creates an opportunity for us to *invest, *and the distribution of its market bid allows us to calculate the expected return for any given bid. We can now treat our bidding problem as an investment optimization problem, and the principle of optimal budget allocation across multiple investments dictates that, at optimal budget allocation,

*The marginal rate of returns across all investments should be equal.*

Interestingly, if we calculate the marginal rate of return (MRR) in our bidding case, it turns out that MRR is a function of the ratio of the bid and the impression value. Translating the above principle for our bidding case, we have:

*With budget constraints, optimal bid should be proportional to valuation*

This is an amazing result. It essentially implies that, the optimal bidding is not only bidding on high value impressions or low priced impressions (those with low expected market bid), but every impression. Although we haven’t discussed how exactly the bids should be, we know that there is ONE optimal proportion/ratio number to be calculated and it should be applied to the value of each impression, to get its optimal bid.

Take the two-impression bidding problem as an example, Let’s call the optimal ratio “r” and the optimal bids “b1” and “b2”. We have:

- b1 = $75*r
- b2 = $100*r
- b1*C1(b1) + b2*C2(b2) <= $100, at r <= 1; here C1 and C2 are the CDFs of the corresponding market bid distribution

The proportionality rule itself does not depend on the market bid distributions. However, the optimal proportion/ratio parameter does come from it. In practice, one can either calculate out when the distributions are (reliably) known, simulate with data, or approximate with real time iteration.

It is very interesting to notice that this “proportionality rule” is quite frequently used in practice by media teams everywhere, without necessarily knowing the optimality of it.

(Those more inclined to rigorous math, can work out the constrained optimization formulation and use the Lagrangian multiplier framework to see the result easily)

**The effect of how others bid**

Let’s review what we found so far: Vickery Auction contributed to the market mechanism design of ad exchanges, allowing bidders to optimally bid without considering how others bid. This is true for bid optimization with no budget constraint, in which the optimal bid is equal to the private valuation of an impression. With budget constraint, optimal bids are proportional to their valuations. The proportion parameter can be managed in many ways. This greatly reduces the complexity of strategic considerations – the game theory framework, as it is not relevant for building the optimal bidding algorithm.

However, this is not to say that how others bid will not impact the *outcome* of a bidder’s optimal bidding. It simple will not affect the optimal bidding strategy or algorithm itself.

In general, bidders evaluating audience and impressions similarly are competing more with each other than others. As competition increase, market bid distributions will move higher for those higher valued impressions, and it will effectively reduce the average rate of return. Still, it is worth emphasizing that under budget constraint,

*The way others bid may impact the optimal ratio parameter and affect the outcome of how you bid; however, it will not affect the proportionality rule. *

This is really quite a strong conclusion, to say the least. The proportionality rule, or “Linearity Rule”, can greatly simplify the design of the optimal bidding process. However, we have to be careful not to extrapolate the conclusion too far, to cases characterized by factors not considered here.

How general is the proportionality rule?

It is important to remember that we have assumed A LOT when we derived the proportionality rule; one or more of those assumptions may fail, including this critical one: we do not know the value of an impression.

We certainly do not know the value of impressions we bid at the bidding time, in fact, some like myself would venture to say that we do not know the values even after the bidding time (remember the attribution problem?). How does this fact affect our bidding strategy?

Case 1: We only know the relative values, but not the absolute scaled values

This is very often the case, and almost always the case, and it comes in all shapes and forms. The key point is: we rarely know that ultimate financial metrics and therefore everything we use is a proxy: click, engagement, registration, conversion, and even a transaction. The models and all the advanced techniques can’t help us fix this proxy issue, and therefore can only provide the best relative valuation.

Luckily, there isn’t much difference between knowing the absolute value and relative value; they are off by a linear scale. With relative value, we are off by a scale multiplier to get the true value, but we magically *regain* it when we estimate the optimal ratio parameter. We retain most of everything we discussed about the optimal bidding rule. The true loss from not knowing the absolute value is only this: we are no longer able to set the bid cap (the requirement that r <= 1 above) based on profitability.

Case 2: We do not have precise knowledge of the value

One thinking is that as long as we have a way to estimate the values without bias (unbiased estimation), we should be fine. As this thinking goes, the amount of our ignorance will be reflected in the variance of our estimation and the effectiveness of the outcome, but will not change the conclusion of the optimization problem, which is mathematically as tight as it can be.

An extreme example of this is when we have no idea of any valuation — all impressions have the same value to us. This is actually quite often the case in the real world, for example, when we care only about the volume of impressions, or the (lower) eCPM of buying or the (minimal) spend. In all these scenarios, the optimal bidding rule can still be applied mechanically, simply assuming all impressions have the same value. This leaves a single parameter and a single bid value to calculate, or estimate.

Case 3: Dealing with margin

This is typical when a DSP serves a client. The adjustment is simply to apply a targeted margin to the client budget, set the result to be the effective budget and we will be fine.

Case 4: Dealing with targeting exclusion

Targeting exclusion are things directly affecting the eligibility of the impression for bidding (on behalf of a campaign), in effect, setting the valuation to zero. Because of this, it will affect the estimation of the ratio parameter “r”, but not the proportionality rule.

Case 5: Fraud, Viewability and Brand Safety

The confusion here is less about how these factors influence optimal bidding and more about how these factors should be considered by the valuation models or as targeting exclusions or other constraints. Without clearing up these sources of confusion, we do not have a well-defined optimization problem, yet.

I hope you can get the sense that this proportional bidding rule as optimal bidding strategy is generally applicable to many cases.

Where’s the catch?

Building a good bidder is difficult; the actual bid optimization algorithm can also be more complex than presented here. Given the discussion above, the simplicity behind the proportionality rule is unbelievable and I am sure many will ask, where’s the catch?

It can’t be this simple; it can’t be just about a linear multiplier; there has to be a catch, right?

Yes and no.

Linear multipliers are used in real world optimal bidding processes. Non-linear adjustments have also been applied sometimes, without making explicit the why and how.

I believe the root cause behind the need for non-linear adjustment is the variance of the value prediction. The competitiveness of the market, characterized by the fact that the true value of impressions (to us) are correlated with the distribution of market price. Because of this, our prediction bias may be correlated with our win-rate to produce a bias among the impressions we win, resulting in a biased outcome from our prediction. The adjustment is intrinsically non-linear because this bias is clearly bid-dependent. One can expect a bigger adjustment is needed with larger variance.

In my opinion, the fix will be this non-linear adjustment process, run prior to the optimization process. Once the post-winner bias is fixed, the optimization rule can be intact and the proportional rule should remain. However, the non-linear adjustment details is the devil and as is often said, the devil is in the details.

Rumor has it that Cheryl was not happy about the way Albert and Bernard sharing information through their seemingly innocuous comments in the Cheryl’s Birthday Riddle (CBR) game.

“That was cheating!” said Cheryl. “I would’ve been really impressed if you’d done it without all the trick.” Upon hearing this, Albert and Bernard fell into deep thought. A long silence later, Bernard said, “Cheryl, pick any date from that list, I’ll be able to find it out”; “with your help of course”, Bernard winked. And thus began the sequel to CBR:

Cheryl determined to make sure no information is leaked; she put each date on a piece of paper, month written on one side and day written on the other side; and all ten pieces in a hat.

Cheryl picked a date from the hat, showed Albert the month and Bernard the day, without looking at the date herself.

Cheryl asked Bernard, “So, do you know the date?” “No, not yet,” replied Bernard.

Cheryl turned to Albert: “Do you know it?” “Neither do I,” Albert replied.

Cheryl smiled, “I guess nobody in this room knows the date”.

Bernard piped up: “Actually, I think I know it now.”

Albert: “I am still in the dark.”

Cheryl: “Good, I am not the only one.”

Bernard: “Now, I am certain I have it right”

Albert: “Ok, I know it too.”

“Me three”, said Cheryl, wearing a big smile on her face.

All three of them had the same date. What is it?

Everyone working in the advertising industry, or related fields, has probably heard of the famous Wanamaker Quote: “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

What he said seems obvious at first; however, when read a little deeper, it could be problematic. Below are a few related points:

**a) The waste may not be 50%; it may in fact be as high as 99%
**

Let’s begin by asking, how did he estimate the advertising waste? Can someone know that amount of waste without being able to identify which part?

There are two ways to estimate media waste: the first one involves breaking down advertising campaigns into different tactics identifying ineffective ones. The tactics can vary by audience attribute (age, gender, behavioral), geo and creative etc.. Take gender as an example, you Male audience maybe twice as effective as Female, so you treat Female tactic as Wasted. The problems with this estimation methodology: the ineffective tactics are not all “wasted” and the effective tactics contains waste tool. The estimation is also quite subjective, since it depends not only on how you define “effective”, but also on how you breakdown campaigns into “tactics”.

The second way of estimating waste, the only defensible one in my view, relies on counting outcome directly. Take direct response campaign as an example: if conversion is the outcome, the money spend without resulting in conversion will be wasted. If display ads reached 30 millions of users and only 3,000 converted, then the spend on the 99% of users is wasted. The actual waste number can be even higher, when considering that the ads shown to the converters may themselves be ineffective and should not be counted as incrementally effective.

**b) It is not just about measurement (alone ), but more about granularity of the underlying measurement**

Knowing how to measure the waste, the next question is: how to solve the waste issue?

The common (traditional, offline) scheme is to define a targeting audience first, following up with a “waste” measurement that is then defined as media delivered outside of the targeting audience. This practice ignores the waste inherent in the definition of the target audience. Age 20-34 maybe five time as likely to convert as others and therefore a valid target audience. However, if the average converter rate is 1%, then conversion rate for this target audience is only 5% – which means 95% is waste as well.

Creating different targeting tactics and measuring them does not necessarily addressing the issue of waste! I am horrified to see how many people believe that bring offline GRP metrics to online solved the display advertising waste problem. Age and Gender data do not generate tactics that are waste-less. You need to use higher dimensional data to create and identify much more granular audience and context/creative groupings in order to truly combating the advertising waste problem!

Is GRP metrics the cure of online advertising waste? I do not think so. In fact, I think it will do more harm than good.

**c) Targetability is key, but often ignored**

To not making this a long writeup, I will make the point really short: without event level targeting, we are not going to solve the waste problem; in fact, we are not even facing it straightly. If nothing else, the most granular level of media transaction mechanism, such as implemented in AdExchange RTB today, is necessary.

I believe the MTA modeling problem is solved with the approach I discussed in the *Unusually Blunt Dialogue on Attribution*. I have since received some questions about the approach, or the agenda; some related to the contents and others about formatting. Today, I am going to try a simple recap, to address those questions.

First of all, the formatting issue. The format in WP is hard to read. A friend of mine (thank you, Steve!) is kind enough to put the content into MS-Word. Anyone interested in reading the dialogues in a better format, can download it here: the attribution dialogue.

Below are Q&A for other questions:

Q: Is attribution problem solved?

A: Hardly. Attribution problem consists of many challenges: data, model/modeling, behavioral insight, reporting, and finally optimization.

Q: When you started, you were aiming to reach a consensus on Attribution Model and Modeling. Have we reached the consensus? Is this attribution modeling problem solved?

A: Consensus is never easy to build and may never be achieved. I believe I have covered enough ground to build consensus on this issue, so we can move on to other businesses. I believe the MTA modeling problem is solved, but I am open to someone who can convince me otherwise.

Q: Is there any remaining issues not covered in your agenda?

A: Yes. One example of the left out issues is the search – display interaction; we handles part of it, but not completely.

Q: What do you mean?

A: There are two types of interactions: the interaction effect at behavioral level, which is covered in the conversion model, and the interaction effect on media exposure. The latter type of interaction is not capturable by conversion models.

Q: This is quite dense … do we need another methodology to model the likelihood of exposure?

A: I do not think individual level modeling is the right approach – lack of data is not the only challenge …

Q: Ok, if this is so, how can we say attribution modeling is solved?

A: I consider this to be outside the main attribution modeling. This trailing piece may need a different handle – a “re-attribution” methodology?

(more to come)

Q: It’s been over a week since we talked last time, and I am still in disbelief. If what you said are true, multi-touch attribution problem is solved! Then again, I feel there are still so many holes. Before we discussion some challenging questions, can you sketch out the attribution process, the steps you proposed.

A: Sure. There are four steps:

Step 1. Developing conversion model(s)

Step 2. Calculating conditional probability profile for each conversion event

Step 3. Applying Shapley Value formula to get the S-value set

Step 4. Calculating fractional credit: dividing S-value by the (non-conditional) conversion probability

Q: And what to call this – the attribution process? algorithm? framework? approach?

A: attribution agenda – of course, you can call it anything you like.

Q: Why don’t you start with data collection – I am sure you heard about GIGA principles and how important having good and correct data is for attribution …

A: I am squarely focusing on attribution logic – data issues are outside the scope of this conversation

Q: I noticed that there are no rule-based attribution models in your agenda. Are the rules really so arbitrary that they are of no use at all?

A: They are not arbitrary – like any social/cognitive rules of custom nature. For attribution purpose, however, they are neither conversion models, which measure how channels actually impact conversion probability, and nor clearly stated justification principle.

Q: What about the famous Introducer, Influencer and Closer framework – the thing everyone use in defining attribution models – and the insights they provided to attribution?

A: They are really of the same concept as the last touch, first touch rules – a position based way of looking at how channel and touch point sequence are correlated. You can use an alternative set of cleaner and more direct metrics to get similar insights – metrics derived from counting the proportions of a channel in conversion sequence as first touch, last touch and neither.

Q: Do these rules have no use at all in attribution process? Can they be used in conjunction with conversion models?

A: You do not use them together, there is simply no needs for them anymore when you can have conversion models. However, there are cases when you do not have sufficient data to build your models. In that case, you can borrow from other models, or use these rule-based models as your heuristic rules.

Q: You are clearly not in the “guru camp” – as you said in your “Guru vs PhD” tweet. Are you in the PhD camp then?

A: No. I also think that they maybe more disappointed than the gurus from the web analytics side ..

Q: I have same feeling – I think you are killing their hope of being creative in the attribution modeling area. With your agenda, there is no more attribution models aside from conversion models, and no more attribution modeling aside from this one Shapley Value formula, and the adjustment factor.

A: The real creativity should be in the development of better conversion model.

Q: Let’s slow down a little bit. I think you maybe over simplifying the attribution problem. Your conversion models seem only work when there is one touch event per channel — how can you handle multiple touch events per channel cases?

A: You may be confusing the conditional probability profile – in which channel is treated as one single entity – with conversion models. In my mind, you can creative multiple variables per channels that reflect complex feature of the touch point sequences for that channel: freq, recency, interval, first indicator etc.. Once the model is developed, you construct the conditional probability profile by taking all the touch points for that channel On or Off at the same time.

Q: Ok. How do you deal with the ordering effect – the fact that channel A first, and B second (A,B) is different from (B,A)?

A: You construct explicit order indicator variables in your conversion models … that way, your attribution formula (the Shapley Value) can remain the same.

Q: And what if the order does not matter.

A: Then the order indicator variables will not be significant in the conversion models.

Q: and the channel interaction?

A: through the usual way you model the interaction effects between two or more main effects.

Q: The separation of conversion model and attribution principle in your agenda is quite frustrating. Why can’t we find innovative ways of handling both in one model – a sort of magic model, Bayesian, Markovian or whatever.

A: Go find it out.

Q: Control/Experiment could be an alternative, isn’t it?

A: Control/Experiment is at better a way of measuring the marginal impact; it is albeit to say that it is an impractical way to measure all levels of marginal impact that a conversion model will support. If we have more than a couple of channels, the number of experiment needed goes up exponentially. It also does not allow post-experiment analysis, and there is no way to practically incorporate recency and sequence patterns etc..

Q: What about optimization principle? If by requiring the best attribution rule as reflecting the optimal way of allocating campaign budget to maximize the number of conversion, one can derive a unique attribution rule, can that be the solution to attribution?

A: No. Attribution problem is about events that happened already and needs to be answered that way, without requiring any assumption about future. Campaign optimization is a related, but separate topic.

Q: Your attribution agenda is limited to conversion event. In reality, a lot of other metrics we care about, such as customer life time value, engagement value etc… How do you attribute those metrics?

A: If you can attribution (conversion) event, you can attribute all metrics derived from that, by figuring out what values linked to that event. In short, you figure out the fractional credit for the event first, then multiple the value of the event, you get the attribution process for that new metric.

Q: You have so far not talked about media cost at all – when we know every attribution vendors are using them in the process. How come there is no media cost in your attribution agenda?

A: Media cost is needed to evaluate cost-based channel performance, not for attribution. How much has a channel impacted a conversion is a fact, not depended on how much you paid the vendor — if there is any relationship, it should be the opposite. The core of attribution process can be done without the media cost data — all vendors ask for it because they want to work on more projects aside from attribution.

Q: Regarding to issue of where should attribution process reside, you picked Agency. Isn’t agency the last place you’d think of when it comes to any technology matter? Since when did you see agency put technology at their core competency?

A: Understandable. I said that not for any of the reasons you mentioned, but for what an ideal world should be. Attribution process is so central to campaign planning, execution and performance reporting, at both tactical and strategic level. Having that piece sitting outside of the integration center can cause a lot of frictions to moving your advertising/marketing to the next level. I said that it should live inside your agency, but I did not say that it should be “build” by the agency; I did not say it should live inside your “current” agency; and certainly, there is nothing prevent you from making your technology vendor into your “new agency”, as long as they will take up the planning, execution and reporting works from your agency, at both strategic and tactical levels.

Q: What about Media Mix Modeling? If we have resources doing that, do we still need to worry about attribution?

A: The micro-macro attribution technologies. It is complicated and certainly need a separate discussion in order to do justice to the topic. The simplest distinction between the are this: when you know the most detailed data of who were touched by what campaigns, you do attribution. If you have none of those data, but only know the aggregated level of media delivery and conversion data, you do MMM.

Q: I have to say that your agenda brings a lot of clarity to the state of attribution. I like the prospect of order; still, I can’t help but think about what a great time everyone have had around attribution models in recent year ..

A: Yes – the state of extreme democracy without consensus. To those who have gun, money and power, anarchy may just be the perfect state; not being cynical, just my glass half-full kind of perspective.

Q: Continue on our yesterday’s conversation … I am still confused about the difference between conversion model and attribution model and attribution modeling. Can you demonstrate using a simple example?

A: Sure. Let’s look at a campaign with one vendor/channel on the media plan …

Q: Wait a minute, that will not be an attribution problem. If there is only one channel/vendor, does it matter what attribution model you use?

A: It does. Do we give the vendor 100% of the credit? A fraction less than 100% of the credit?

Q: Why not 100%? I think all commonly used attribution models will use 100% …

A: You may want to think twice, because some users may convert on their own. Let’s assume the vendor reach 10,000 users and 100 of them converted. Let’s also assume that, through analysis and modeling works (such as using a control group), you conclude that 80 out of the 100 converters will convert on their own. How many converters does the vendor actually (incrementally) impacted?

Q: 20.

A: If you assign 100% credit to the vendor, the vendor will get all 100 converters’ credits. Since the actual impacted conversion is 20, a fraction of credit should be used; in this case it is 20% instead 100%. That’s attribution modeling, in its simplest form.

Q: Really? Can you recap the process and highlight the attribution modeling part of it?

A: Sure. In this simplest example, the conversion model provides us two numbers(scores):

1) The probability of conversion given the converter exposed to the campaign, call it P(c|camp) – in this case it is 100/10000 = 1% , and

2) The probability of conversion given the converter not exposure to the campaign, call it P(c|no-camp) – in this case it is 80/10000 = 0.8%

The attribution modeling says that, only a fraction of the credit, (P(c|camp)-P(c|no-camp))/P(c|camp) == 0.2 or 20%, should be credited out.

Notice that this fraction for attribution is not 100%. It is not P(c|camp) which is 1%; and it is not P(c|camp) – P(c|no-camp) which is 0.2%.

Q: This is an interesting formula. I do not recall seeing it anywhere before. Does this formula come from the conversion model?

A: Not really. The conversion model only providing the best possible estimate for P(c|camp) and P(c|no-camp), that’s all. It will not provide the attribution fraction formula.

Q: Where does this formula come from then?

A: It comes from the following reasoning: vendor(s) should get paid for what they actually (incrementally) impacted, not all the conversions they touched.

Q: So the principle of this “attribution modeling” is not data-driven but pure reason. How much should I trust this reasoning? Can this be the ground to build industry consensus?

A: What else can we build consensus on?

Q: Ok, I see how it works in this simple case, and I see the principle of it. Can we generalize this “incremental impact” principle to multi-channel cases?

A: What do you have in mind?

Q: Let me try to work out the formula myself. Suppose we have two channels, call them A, and B. We start with conversion model(s), as usual. From the conversion model(s), we find out our best estimates for P(c|A,B), P(c|nA,nB), P(c|nA,B), P(c|A,nB). Now I understand why it does not matter if we use logistic regression, or probit model or neural network to build our conversion model – all that matter is to make sure we get the best estimates for the above scores J

A: Agree. By the way, I think I understand the symbols you used, such as c, A, nA, nB etc. – let me know if you think I may guess it wrong :)

Q: This is interesting, I think I can get the formula now. Take channel A first, and let’s call the fractional credit A should get as C_a; we can calculate it with this formula: C_a= (P(c|A,B)–P(c|nA,B)) / P(c|A,B), right?

A: If you do that, C_a + C_b maybe over 100%

Q: What’s wrong, then?

A: We need to first figure out what fraction of attribution available to be credited out to A and B, just as in the simplest case discussed before. It should be (P(c|A,B) – P(c|nA,nB)) / P(c|A,B).

Q: I see. How should we divide the credit to A and B next?

A: That is a question we have not discussed yet. In the simplest case, with one vendor, this is a trivial question. With more than one vendor(s)/channel(s), we need some new principle?

Q: I have an idea: we can re-adjust the fractions on top of what we did before, like this: C’_a = C_a / (C_a + C_b) and C’_b = C_b/(C_a + C_b); and finally, we use C’_a and C’_b to partition the above fraction of credit. Will that work?

(note: the following example has error in it, as pointed out by Vadim in his comment below)

~~A: ~~*Unfortunately, no. Take the following example:*

*suppose A add no incremental value, except when B is present: P(c|A,nB) == P(c|nA,nB) and P(c|A,B) > P(c|nA,B)*

*also, B does not add anything when A is present: P(c|A,B) = P(c|A,nB)*

*The calculation will lead to: C_b == 0 and C_a > 0. Therefore, A get all the available credit and B get nothing.*

*Do you see a problem?*

*Q: Yes. B will feel unfair, because without B, A will contribute nothing. However, A get all the credit and B get nothing.*

A: This is just a case with two channels and two players. Imaging if we get 10 channels/players, what a complicated bargaining game this is going to be!

Q: Compare with this, the conversion model part is actually easy; well, not easy but more like a non-issue. We can build conversion models to generate all these conditional probability scores. However, we still stuck here and can’t figuring out a fair division of credit.

A: This is attribution modeling: the process or formula that will translate the output of conversion models into attribution model (or fractional credits). We need to figure this thing out.

Q: What is it, really?

A: We are essentially looking for a rule or a formula to divide the total credit that we can all agree as fair. Is that right?

Q: Right, but we have to be specific about what do we mean by “fair”.

A: That’s right. So, let’s discuss a minimal set of “fair” principles that we can all agree upon. There are three of them, as I see it:

Efficiency: we are distributing all available credit, not leaving any on the table

Symmetry: if two channels are functionally identical, they should get the same credit

Dummy Channel: if a channel contribute nothing in all cases, it should get no credit

What do you think?

Q: I think we can agree with these principles. How can they help?

A: Well, someone has proved that there is one and only one formula that satisfy this minimal set of principles. I think this is our attribution formula!

Q: Really? I do not believe this. Who proved this? Where can I read more of it?

A: In 1953, Lloyd Shapley published the proof in his PhD dissertation and the resulting formula became Shapley Value. The field of knowledge is called Cooperative Game Theory. You can Google it and you will find tons of good references. Of course, Shapley did not call it “attribution problem” and he talked about players instead of channels. The collection of principles are more than three. However, Transferable Utility and Additive principle are automatically satisfied when applied to credit partitioning problem.

Q: Now, how do you apply this attribution rule differently for different converters?

A: You do not. The difference among converters are reflected in the scores generated from the conversion models, not in the above attribution formula – or Shapley Value.

Q: Ok, if that is the case, everyone in the industry will be using the same Attribution Formula, or Shapley Value. How do we then creatively differentiate from each other? How should different type of campaigns be treated uniquely? How do the effect of channels on different types of conversion be reflected in attributed credits?

A: Well, all these will be reflected in how the conversion models are built and how the parameters of the conversion models are estimated, and finally the scores that come out of the conversion models. You will innovate on statistical model development techniques. Attribution formula is, fortunately, not where you are going to innovate.

Q: This is quite shocking to me. I can’t imagine how the industry will react …

A: How did industry deal with Marketing Mix Modeling? We accept the fact that those are simply regression models in essence, and start selling expertise on being able to do it thoroughly and do it right. We do not have to create our own attribution model to be able to compete with each other.

Q: I will begin with this question, what you do NOT want to talk about today?

A: I do not want to waste time on things that most people know and agree with, such as “Last Touch Attribution is flawed”

Q: Why is attribution model such a difficult challenge, that after many years we still seem to just begin scratching the surface of it?

A: No idea.

Q: Let me try a different way, why is it so hard to build an attribution model?

A: It is not. It is NOT difficult to build an attribution model – in fact, you can build 5 of them in less than a min: Last Touch, First Touch etc… J It is difficult to build good attribution modeling – a process that produce methodologically sound attribution model.

Q: “Attribution modeling” – is this the kind of tool already available through Google Analytics.

A: No. Those are attribution model specification tools – “you specify the kind of attribution models to your heart’s content and I do reporting using them”. They do not tell you what IS the RIGHT attribution model. An attribution reporting tool does not make an attribution modeling tool.

Q: “Methodologically sound” – that seems to be at the heart of all attribution debates these days. Do you think we will ever reach a consensus on this?

A: Without a consensus on this, how can anyone sell an attribution product or service?

Q: On the other hand, isn’t “algorithmic attribution” already a consensus, that everyone can build on it?

A: What is that thing?

Q: All vendors seem to take the “algorithmic attribution” approach, possibly adding additional phrases, such as “statistical models” and data-driven etc. Isn’t that sufficient?

A: How? They never show how it works.

Q: Do you really need to get into that level of detail, the “Black Box” – the proprietary algorithm that people legitimately do not release to the public?

A: There is no reason to believe that anyone has a “proprietary algorithm” for attribution. Unlike predictive modeling, a domain of technology that can be “externally” evaluated without going inside the Black Box, attribution modeling is like math, a methodology whose validity needs to be internally justified. A Black Box for attribution sounds like an oxymoron for me. You do not see people claim that they have a “proprietary proof” of Fermat’s Last Theorem. (Ironically, Fermat himself claimed the proof on the margin of a book without actually showing it, but everyone knows he never intended it to be like that).

Q: Why then do people claim to have but do not show their algorithmic and/or modeling approach?

A: It is anyone’s guess. I see no reason for that; it hurts themselves and it hurts the advertising industry, particularly online advertising industry. I suggest, from today, every vendor should either stop claiming that they have proprietary attribution modeling/model or get out of the “Black Box” (the new empire’s cloth?) and prove the legitimacy of their claim.

Q: Ok, suppose I say, I build a regression model to quantify which channels impact conversion and by how much, then calculate the proportional weights based on that and partition the credits according to the proportions. What would you say?

A: How?

Q: You are not serious, right? I am giving you so much details – how much more do you want?

A: The program and process sounds like it will work, and it is quite CLEAR that it is going to work to non-practitioners’ eyes. But you know and I know that it does NOT work. Having built conversion models does not solve the attribution problem. Attribution problem comes down to the partitioning of credit, i.e. how much of the conversion credit to be partitioned and how much given to each channels. The logic has to be explicitly presented and justified. The core challenge has been glossed over and covered up, but not solved.

Q: Please simply it for me.

A: There is no automatic translation available from conversion models to attribution models – the process of doing that, which is attribution modeling has to be explicitly stated.

Q: You defined attribution problem as partitioning credit to channels – are you talking about only Cross-Channel Attribution? If I want to focus only on Digital Attribution, or even Publisher Attribution only, is what you said still relevant?

A: Yes. I am talking about it from data analytics angle – you can just replace the word “channel” with others and the rest will apply.

Q: Ok, what if the conversion model I use is not regression, but some kind of Bayesian models?

A: It does not matter. It can be Bayesian, Neural Net or a Hidden Markov Model. As long as it is a conversion model. The automatic translation is not there.

Q: Does it matter if the conversion model is predictive or descriptive?

A: It should be a conversion model – there are multiple meanings of “predictive model”; it is essentially predictive models, but need not handle “information leaking” type of issues as a predictive model should.

Q: Does it need to be “causal” model, and not a “correlational” model?

A: Define causal for me. Specifically, do people know what they mean by “correlational” model? Do they know multivariate models and dependence concepts?

Q: I assume we know. Causal vs. correlational are just common sense concepts to help us make the discussion around “model” more precise …

A: But neither are more precise concepts than statistical modeling language. Even philosophers themselves begin to use statistical modeling language to clarifying their “causal” framework …

Q: Now I am confused. Where are we right now?

A: We are discussing statistical models and attribution modeling …

Q: Ok, should we use statistical models when we do attribution?

A: We have to. Quantifying the impact of certain actions on conversion should be the foundation for any valid attribution process; there are no more precise ways to do that than developing solid statistical models for conversion behavior!

Q: Not even experimental design?

A: Not even that.

Q: But what is the right statistical model? Some types of regression models or some Bayesian models or Markovian models?

A: It does not have to be any one of them, and yet, any one of them may do the job.

Q: If that is true, how can one justify the objectivity of the model?

A: A conversion model provides the basis for what reality looks like – to our best knowledge at the moment. There can be different types of statistical methodologies to model the conversion behavior, and that does not create problems with the objectivity of the model output. We have seen this in marketing response models, where the modelers have the freedom to choose whatever methodology (type of models) they deem appropriate and yet it does not compromise the objectivity of its results.

Q: But attribution is different; when building marketing response models, what is important is the score, not the coefficients or any “form factors” of the model. In attribution, those form factors are central, and not scores, to derive the attribution formula.

A: That’s exactly the problem that needs to be corrected. Attribution formula should NOT be built on the “form factors” of the conversion model, but rather on the scores of the conversion models!

Q: Explain more …

A: If you can’t claim that linear regression model IS the only right model for conversion behavior, you can’t claim those regression coefficients, the “form factors” of the regression models, are intrinsic to the conversion behavior. Thus, any attribution formula built on top of that cannot be justified.

Q: And the conclusion, in simpler language …

A: Conversion model is needed for attribution, but attribution model is not the conversion model. Attribution model should be built on top of the “essence” part of the conversion models, i.e. the scores, and not the form factors. Attribution modeling is the process of translating conversion modeling results to attribution model.

Q: What is that saying about the offering from current vendors?

A: They often tell us that they build conversion models, but reveals nothing about their attribution modeling methodology.

Q: What if they say that, they are hiding their proprietary attribution technology in Black Box? Are they just covering up the fact that they have nothing in there, and they do not know how?

A: Anyone’s guess. The bottom line is, anyone claiming anything should acknowledge the right to doubt from their audience.

Q: It is common to see companies hiding their predictive modeling (or recommendation engine technology) in Black Box … why not attribution?

A: Predictive modeling, or even recommendation modeling, are things that can be externally tested and verified. You can put two predictive model scores, and test out which one has more predictive power without knowing how they build the models. Attribution modeling is different; you have to make explicit how and why your way of allocation is justified – otherwise, I have no way of verifying and validating your claim.

Q: We are not in the faith business …

A: Amen.

Q: Ok, big deal. I am an advertiser, what should I do?

A: Demand anyone who is selling you attribution products/services, to show you their attribution stuff. It is ok if they hide the conversion model part of it, but do not compromise on the attribution modeling.

Q: I am in the vendor business, what should I do?

A: Defend yourself – not by working on defensive rhetoric, but by building and presenting your attribution modeling openly.

Q: If I am an agency, what should I do?

A: Attribution should live inside the agency. You can own, or rent it; you should not be fooled by those who like to make you think attribution modeling is a proprietary technology – it is not. Granted that you are not a technology company, but attribution modeling is not a proprietary technology. If you have people who can build conversion model, you are right up there with those “proprietary” attribution vendors.

Q: If attribution modeling becomes an “Open” methodology, what about those attribution vendors? What they will own and why advertisers and agencies wouldn’t build themselves?

A: That’s my question too J

Q: Are vendors going to be out of business?

A: Well, they can still own the conversion modeling part of it … and there are still predictive modeling shops out there, in business …

Q: Somehow, you sound like you know something about this “open secret” already J Can you share a little on that?

A: Can we talk tomorrow? I need to leave for this “Attribution Revolution” conference tonight …

The term “attribution modeling” can have different meanings to different people – sometime being used interchangeably with “attribution model”. To me, attribution model refers to things like “last touch”, “first touch” etc. – rules that specify how attribution should be done. Attribution modeling is about the process where attribution model is generated. Attribution Modeling give us the model generation process, as well as the reasons and justification of the attribution model being derived.

It is not difficult to come up with an attribution model, in fact, we can make up one in seconds. What difficult is to determine which one is the *right* attribution model. Despite all the discussion and progress made over the last few years, there is no consensus about it. And the lack of industry consensus really hurt.

The question about right attribution model is perhaps miss-guided; for we all know that a model could be right for one business, say e-commerce, may be wrong for another, such as B2B. What is right may also depend on type of campaigns, type of conversions and even type of users (male vs female, adult vs teens). The right question should be: what’s the right attribution modeling – the right process of how an attribution model is generated.

Each one of us can easily list 4 or 5 most commonly used attribution models. What about attribution modeling? How many different processes can attribution model be produced?

Last Click/Last Touch attribution models are examples where intuition is the modeling process. It is not data driven. You can argue about the good and bad conceptually. On the other hand, data-driven approach holds the fundamental belief that the right attribution model should be derived from data. Within data-driven approach, there are two slightly differing approaches: experimental design vs algorithmic attribution.

You may ask, what about Google’s Attribution Modeling Tool in Google Analytics? It is not really an Attribution Modeling Tool in my use of the word, it helps you specifying attribution models, not creating any data-driven models. It does not tell you how to derive the “right” attribution model.

The data-driven approach is what we will focus here. There has been great progress in the “algorithmic attribution” approach, and significant business build on this (Adometry and VisualIQ to name a couple). However, none is clear and transparent enough about their key technologies – as an industry, we left with a lot of confusions.

The set of difficult problems are about that – the core technology of attribution modeling. We need to answer these questions so we can build upon a common ground and move on. Here’s a list of the questions/problems:

1) Is attribution modeling the same as statistical conversion modeling?

2) What’s the right type of models to use: predictive modeling, descriptive modeling, causal modeling?

3) Does it matter if the model is linear regression, logistic or some bayesian network model?

stay tuned for more.