.. or should I say, what you need may not be what you want?
Attribution problem, particularly the macro attribution problem, is traditionally asking for a way to partition the success metrics to each marketing efforts, in the form of percentages, so that the relative contributions can be measured. The hope is that these percentages can be used, aside from figuring out how to distribute bonus, to guide the optimal budgetting decision. However, the promise of using attribution as an optimization methodology is flawed. Attribution is basically an additive model of business process where the success can be partitioned as if they are mutually independent. It is problematic when the actually relationship between marketing efforts and their success metric is non-linear – due to either the presence of interaction (in the form of synergy or cannibalization) effects or the intrinsic quantitative relationship.
When the data-driven relationship/model is shown to be non-additive and non-linear, it may not be intuitively clear of how to use it for attribution, i.e. coming up with the percentages. On the other hand, the non-linear non-additive model should just be what you need for operational optimization, such as budget optimization decision. This is because true optimization follows the equalization principle of marginal returns, rather than the averages. Attribution percentage is not as necessary for optimization precisely because it is based on the average. It still useful in many cases when the averages and marginals are highly correlated.
This is the basic idea of this post, and the reasons I was asking everyone to come back and read. For fundamentally, you need to optimize your operation; not just a set of percentages for distributing credits.
The other point I want to make is that the percentages that every attribution is trying to get at, or every attribution rules is trying to produce, only make senses with some types of causal interpretation. If the data show that one factor/effort has no real influence on success/conversion, then it is conceptually not justifiable to attribution anything to it. Again, the right interpretation of an attribution is based on the affirmation of a causal relationship. This is another reason for why the statistical modeling is fundamentally important.
I am not saying that the conversion model I mentioned is the typical “conversion model” we used in the DM context; it does not has to be exclusively predictive modeling. The special type of conversion model that we are building is a causal model based on empirical data. Much of the predictive modeling techniques do apply, but still there are some differences. For example, if it is for prediction purpose, proxy variables are as valid as any other variables. It may not be automatically acceptable when building a model that require causal interpretation.
Correct attribution rules, should be based on sound conversion model. Its implementation in web analytics tools can facilitate the reporting process for insight generation and monitoring purposes. It will have a similar role as what it does now. What I am arguing about it that it should be build based on sound data-driven conversion model, not simply intuitions. My point goes a little further in that I am also arguing the use of conversion models (be it linear or non-linear, additive or non-additive), but not the attribution percentages.
In sum, conversion model will provide what you needed, which is the ability to optimize your operation, but may not be what you wanted with attribution; those percentages that we all like to see and talk about are ultimately less critical than what we thought.
Please come back and read the next post on a deep dive example of conversion modeling approach to attribution.
Comments?