Some of my old writing on Linkedin, published a year and half ago. Below is a list of links:

The right attribution framework – part 1

The right attribution framework – part 2

The right attribution framework – part 3

Some of my old writing on Linkedin, published a year and half ago. Below is a list of links:

The right attribution framework – part 1

The right attribution framework – part 2

The right attribution framework – part 3

Ad Tech is an exciting industry, rich in complex and multidisciplinary challenges. It can also be quite intimidating and confusing with its maze of jargon and operational black-boxes.** **Bid Optimization Algorithm is often considered one of those black-boxes.

Real Time Bidding (RTB) is at the core of Ad Tech’s operations. At the heart of RTB are the bidders, running their mythical bid optimization algorithms. I have been fascinated with the inner workings of a bidder for many years and learned a few things about optimal bidding along the way. The purpose of this article is to share my thoughts with those who are interested in bid optimization algorithms while applying rigorous thinking but written in plain and simple language. I understand that this is close to mission impossible and I will more than likely fail; I still want to give it a try. So wish me luck and let me know how far I get!

**A Bidder’s Triple Challenges**

Real Time Bidding is a new kind of media buying-selling mechanism – transacting one impression at a time through an auction marketplace; this is quite different from the traditional way of buying/selling media in bulk via contract negotiation. Publishers put ad-impression opportunities in the auction marketplace (i.e. ad exchange), collect bids from bidders who represent buyers (advertisers and their agencies), and the winner, winning ad and winning bid are all selected in real time.

A bidder is really the brain on the buy-side (demand side) of the market. Building a bidder is quite a challenging job – just imagine yourself receiving a million bid requests per second, running them through fraud-detection screening, evaluating the value of each impression opportunity to your clients and making the best bidding decision given the budgets and other client constraint and doing all these a million times every second!

Generally speaking, a bidder has three major challenges: 1) the *technological challenge* to handle the speed and scale (cloud computing, Memcached and NoSQL); 2) the statistical learning challenge to model and estimate the expected value of each impression (click, conversion, test/control and attribution) and 3) the *mathematical* challenge of bid optimization: how to bid optimally given the change and complexity of the marketplace and all the business constraints.

*Bid Optimization ***is the focus of this discussion**

Let’s start with simple case: bidding on a single impression on behalf of one client. For a single bid request, you need to decide on an optimal bid: should it be $1, $100, 2c or no bid?

Upon further thought, you may realize that you have no base for making any decision. This is because the optimization problem is not yet fully defined. We need to specify the context and parameters before we can meaningfully talk about optimal bidding – we need to know what the auction market/rule is. How is cost calculated when we win, how many others are competing in the auction and how do they bid? What’s my goal and constraints? Nothing makes us feel more powerless than struggling with ill-defined problems.

*Vickery Auction*

Auction is a process where buyers compete against each other and a winner is decided by a certain rule(s), usually the highest bid price. In general, there are two classes of auctions: open auction and close auction, based on whether bids are openly communicated to every bidder in the bidding process. Open auction is further divided into English auction, where the bidding starts low and goes up and Dutch auction, where the asking price starts high and goes down.

Ad exchanges use Vickery Auction or variants of it; it is a closed/sealed auction where each bidder bids without knowledge of how others bid. It is also a second price auction: the bidder with the highest bid wins but pays the second highest bid, not its own bid. You may wonder why any seller would agree to get paid by the second highest bid. The (somewhat) surprising reason is that, this auction rule helps the bidders declare their TRUE value (private valuations) of the item; this indirectly may help the seller get a higher bid.

The common understanding is that under Vickery Auction, bidders need not worry about how others bid strategically against them, they can simply bid at their private value of the item. This is a good property in a tool for designing a simple and efficient market mechanism.

Unless said otherwise, Vickery Auction is assumed in the following discussion.

**Bidding a single impression**

Back to our simple case of bidding on a single impression. We now know that, unless we know the value of the item to us, we really do not have a basis to talk about an optimal bid. Once we know our value for the item, under Vickery Auction, we should just bid at that value, regardless of how others bid. This is amazingly simple.

It is important to understand this, so let’s check it out and make sure we are all convinced. Suppose the impression has a value of $100 to us, and the highest bid from others is x (meaning we do not know exactly how much). The claim is that bidding $100 is the best. How?

Suppose we bid lower than $100, say $50. If x is less than $50 or more than $100, the outcome will be identical to the bidding at $100. When x falls between $50 and $100, assuming x is $75, bidding at $100 will win this case and gain $25 ($100 value – $75 cost), whereas bidding at $50 at will lose the bid and miss the opportunity of this $25 gain. Thus the proof.

What if you bid higher than $100, say $130? If x is less than $100 or greater than $130, then the two bids result in the same outcome. When x is in between $100 and $130, bidding at $130 will win resulting in a net cost of ($x-$100). Thus, bidding higher than your value is not a good strategy.

*Optimal bidding is simply bidding at your valuation*

Convinced? Intuitively this is too simplistic; the real world is surely much more complex. Let’s dig into the complexity and see how they may impact our optimal bidding strategy. The main complexity starts when there are more bid requests and we have to decide which one to bid on and how much to bid.

Let’s take the next step and assume we now have two impression requests. How should we bid? A simple approach is to treat the two bids as independent and apply optimal bidding to each impression. Any problem with this?

One problem is the budget. Let’s assume our budget is $100. With two impressions valued at $75 and $100, if we bid at the (independent) optimal bids, $75 and $100 respectively, the following are three possible outcomes:

- We win both, costs are at $60 and $80
- We win both, costs are at $40 and $60
- We lose the first and win the second, cost is $85

With (1) we are over budget and the bids are impermissible. (2) is the perfect scenario. (3) is sub-optimal, with left over budget. Given that all three scenarios are possible, we really do not know what to say about our bidding strategy.

We are in this interesting situation again, with an insufficiently defined optimization problem. We need to add an extra component to setup the problem properly. What is this missing component?

We need to know which one of the above three situations will arise, for each impression. If we know the full set of bids from others, it will be suffice. However, at the time of the bidding, the set of other bids is not known, no matter how much we might want it. What we can know is the probability distribution of other bids. In fact, all we need to know is the *probability distribution of the market bid (the highest other bid, your bid not included)* – this is in fact the missing component, for us to fully specify our budget constrained optimization problem. The distribution can be impression specific; the set of distributions should provide the sufficient information about the bidding environment.

**Optimal bidding with budget constraint **

It is interesting to notice that, when we bid on a single impression with no budget constraint, we do not need to know how others bid. We do not need to know the distribution of the highest other bid; even if we know it, it changes nothing about how we bid.

With more than one impression to bid on, we need to decide how to optimally spread the budget over the set of impression bid requests. Each bid request creates an opportunity for us to *invest, *and the distribution of its market bid allows us to calculate the expected return for any given bid. We can now treat our bidding problem as an investment optimization problem, and the principle of optimal budget allocation across multiple investments dictates that, at optimal budget allocation,

*The marginal rate of returns across all investments should be equal.*

Interestingly, if we calculate the marginal rate of return (MRR) in our bidding case, it turns out that MRR is a function of the ratio of the bid and the impression value. Translating the above principle for our bidding case, we have:

*With budget constraints, optimal bid should be proportional to valuation*

This is an amazing result. It essentially implies that, the optimal bidding is not only bidding on high value impressions or low priced impressions (those with low expected market bid), but every impression. Although we haven’t discussed how exactly the bids should be, we know that there is ONE optimal proportion/ratio number to be calculated and it should be applied to the value of each impression, to get its optimal bid.

Take the two-impression bidding problem as an example, Let’s call the optimal ratio “r” and the optimal bids “b1” and “b2”. We have:

- b1 = $75*r
- b2 = $100*r
- b1*C1(b1) + b2*C2(b2) <= $100, at r <= 1; here C1 and C2 are the CDFs of the corresponding market bid distribution

The proportionality rule itself does not depend on the market bid distributions. However, the optimal proportion/ratio parameter does come from it. In practice, one can either calculate out when the distributions are (reliably) known, simulate with data, or approximate with real time iteration.

It is very interesting to notice that this “proportionality rule” is quite frequently used in practice by media teams everywhere, without necessarily knowing the optimality of it.

(Those more inclined to rigorous math, can work out the constrained optimization formulation and use the Lagrangian multiplier framework to see the result easily)

**The effect of how others bid**

Let’s review what we found so far: Vickery Auction contributed to the market mechanism design of ad exchanges, allowing bidders to optimally bid without considering how others bid. This is true for bid optimization with no budget constraint, in which the optimal bid is equal to the private valuation of an impression. With budget constraint, optimal bids are proportional to their valuations. The proportion parameter can be managed in many ways. This greatly reduces the complexity of strategic considerations – the game theory framework, as it is not relevant for building the optimal bidding algorithm.

However, this is not to say that how others bid will not impact the *outcome* of a bidder’s optimal bidding. It simple will not affect the optimal bidding strategy or algorithm itself.

In general, bidders evaluating audience and impressions similarly are competing more with each other than others. As competition increase, market bid distributions will move higher for those higher valued impressions, and it will effectively reduce the average rate of return. Still, it is worth emphasizing that under budget constraint,

*The way others bid may impact the optimal ratio parameter and affect the outcome of how you bid; however, it will not affect the proportionality rule. *

This is really quite a strong conclusion, to say the least. The proportionality rule, or “Linearity Rule”, can greatly simplify the design of the optimal bidding process. However, we have to be careful not to extrapolate the conclusion too far, to cases characterized by factors not considered here.

How general is the proportionality rule?

It is important to remember that we have assumed A LOT when we derived the proportionality rule; one or more of those assumptions may fail, including this critical one: we do not know the value of an impression.

We certainly do not know the value of impressions we bid at the bidding time, in fact, some like myself would venture to say that we do not know the values even after the bidding time (remember the attribution problem?). How does this fact affect our bidding strategy?

Case 1: We only know the relative values, but not the absolute scaled values

This is very often the case, and almost always the case, and it comes in all shapes and forms. The key point is: we rarely know that ultimate financial metrics and therefore everything we use is a proxy: click, engagement, registration, conversion, and even a transaction. The models and all the advanced techniques can’t help us fix this proxy issue, and therefore can only provide the best relative valuation.

Luckily, there isn’t much difference between knowing the absolute value and relative value; they are off by a linear scale. With relative value, we are off by a scale multiplier to get the true value, but we magically *regain* it when we estimate the optimal ratio parameter. We retain most of everything we discussed about the optimal bidding rule. The true loss from not knowing the absolute value is only this: we are no longer able to set the bid cap (the requirement that r <= 1 above) based on profitability.

Case 2: We do not have precise knowledge of the value

One thinking is that as long as we have a way to estimate the values without bias (unbiased estimation), we should be fine. As this thinking goes, the amount of our ignorance will be reflected in the variance of our estimation and the effectiveness of the outcome, but will not change the conclusion of the optimization problem, which is mathematically as tight as it can be.

An extreme example of this is when we have no idea of any valuation — all impressions have the same value to us. This is actually quite often the case in the real world, for example, when we care only about the volume of impressions, or the (lower) eCPM of buying or the (minimal) spend. In all these scenarios, the optimal bidding rule can still be applied mechanically, simply assuming all impressions have the same value. This leaves a single parameter and a single bid value to calculate, or estimate.

Case 3: Dealing with margin

This is typical when a DSP serves a client. The adjustment is simply to apply a targeted margin to the client budget, set the result to be the effective budget and we will be fine.

Case 4: Dealing with targeting exclusion

Targeting exclusion are things directly affecting the eligibility of the impression for bidding (on behalf of a campaign), in effect, setting the valuation to zero. Because of this, it will affect the estimation of the ratio parameter “r”, but not the proportionality rule.

Case 5: Fraud, Viewability and Brand Safety

The confusion here is less about how these factors influence optimal bidding and more about how these factors should be considered by the valuation models or as targeting exclusions or other constraints. Without clearing up these sources of confusion, we do not have a well-defined optimization problem, yet.

I hope you can get the sense that this proportional bidding rule as optimal bidding strategy is generally applicable to many cases.

Where’s the catch?

Building a good bidder is difficult; the actual bid optimization algorithm can also be more complex than presented here. Given the discussion above, the simplicity behind the proportionality rule is unbelievable and I am sure many will ask, where’s the catch?

It can’t be this simple; it can’t be just about a linear multiplier; there has to be a catch, right?

Yes and no.

Linear multipliers are used in real world optimal bidding processes. Non-linear adjustments have also been applied sometimes, without making explicit the why and how.

I believe the root cause behind the need for non-linear adjustment is the variance of the value prediction. The competitiveness of the market, characterized by the fact that the true value of impressions (to us) are correlated with the distribution of market price. Because of this, our prediction bias may be correlated with our win-rate to produce a bias among the impressions we win, resulting in a biased outcome from our prediction. The adjustment is intrinsically non-linear because this bias is clearly bid-dependent. One can expect a bigger adjustment is needed with larger variance.

In my opinion, the fix will be this non-linear adjustment process, run prior to the optimization process. Once the post-winner bias is fixed, the optimization rule can be intact and the proportional rule should remain. However, the non-linear adjustment details is the devil and as is often said, the devil is in the details.