Analytics Strategist

December 31, 2008

Attribution Models

In marketing, particularly in search engine marketing, there has been a growing interest in attribution models.  It is perhaps no coincidence that the same period saw a tighten-up budget and increasing demand for accountability – afterall, attribution is the process of how success are credited to its source(s) – a highly contentious field in marketing. 

This is one of the many reasons I expect attribution modeling to atrract even more attention in 2009 – with SEM and multi-channel marketing at the center of it all. 

There are many uses of “attribution”: in arts and academia it refers to crediting the original authors; in performance attribution – a large area covering investment, marketing etc – it refers to crediting results (or partitioning the results) to its sources or its causes; still it has a place in psychology where the attribution processes of behaivor is the focus of study.

Attribution in marketing/advertising world is the process of attributing the success (usually sales or other metrics) to the marketing/advertising activities.  Since most of the time different activities result in different customer touch points, it reduce to crediting sucess to different touch points.  From this perspective, multi-protocol attribution, engagement mapping, marketing mix modeling, even customized 800 numbers are all attribution approach and techniques.

In the follow up post, I will discuss in details the other aspects of attribution modeling; why attribution modeling is important, what are the different type of attribution challenges, how to do it, and finally what are the limitations of the whole attribution modeling approach …

December 30, 2008

The annoying misuse of web analytics

Filed under: Business, Web Analytics — Tags: — Huayin Wang @ 10:04 pm

I really dislike people using Web Analytics as a synonym for web analytic tools and softwares working with web traffic data.  Software is not discipline; wake up people!

The confusing usage comes from the fact, based on my wild guess, that there are some people for whom there is no analytics other than Web Analytics.  Well, get educated!

For those who have trouble with how I felt, please read the following paragraphes:

“Analytics solutions typically fall short in understanding post-impression data …”

“the common limitation of analytics is that it lacks the data insights of offline behavior …”

The misuse of the word “Web Analytics” is corrupting the commnication – it may make sense in the little corner of some people and only within their little corner.

Web Analytics is not refering to a set of software/tools (which includes Google Analytics), it refers to a subdiscipline of data analytics.  Please, stop labeling “Best Web Analytics” when just you really mean just a comparison of web analytics tools.

December 18, 2008

Google’s achilles’ heel – a follow up

Filed under: Datarology, Technology — Tags: , , , , — Huayin Wang @ 9:52 pm

Follow up to one of my early post about google’s achillis’ heel, I’d like to add that Google’s latest searchwiki seems to be an interesting response to what I mentioned earlier — I know I know it is not quite like that 🙂

I love to count the many different ways of ranking stuff in response to a search query.  The objects, the stuffs, can be text, link, document, image, video etc..  The ranking principle is the essentially a rule of relevance and/or similarity. I count four main types:

1) by content similarity, the algorithm could be PageRank, HITS etc..  For images and videos, this can prove to be very difficult because it involves not only the hard core technology such as pattern recognization for images, but also involves large stocks of prior knowledge about object categorization etc..

2) by similarity of user behavior, when applied some kinds of collective intelligence, or collaborative filtering type of algorithms.  User behavior can serve as implicit voting; with algorithms’ help, the complexity of the ranking operation can be dramatically reduced.

3) by similarity of user explicit ratings.  Users’ search phrase and explicit ratings ( ratings/reviews on amazon, as well as Google’s latest searchWiki, which interestingly only affect what user see next time, not anyone else’).  Some types of social/collective intelligence algorithm has to be applied in order to solve the complexity issue, as well as the sparse data problem associated it when crossing search query with user ratings.

4) of course, there is always the money logic.

If you know more ranking logic than what posted here, I’d like to know it ..

August 9, 2008

Privacy and Behavioral Targeting

Filed under: Advertising, Datarology, Technology, Uncategorized — Tags: , , — Huayin Wang @ 3:49 pm

the two seem on a collision course lately – this is really unfortunate!
Boneheaded privacy advocates mistaken baby with bath water; whereas companies who use BT failed to see the golden opportunity with better analytics technologies..
time for “infocrypt behavioral profile” ?

July 29, 2008

Google’s achilles’ heel

Filed under: Advertising, business strategy, Datarology, Technology, Uncategorized — Tags: — Huayin Wang @ 5:22 pm

just a thought 🙂
There have been three search engine ranking principles at works: 1) by content match with search query, 2) by user feedback (or social search) data to query or similar query, and 3) by bidding price. The logic that used by Google Adwords is a complex combination of all three (relevancy, CTR and bid price).

For example, Amazon and Netflex represent the pure form of 2).

All three principles have their own merit and, here’s why it is important, many times one pure logic may match users’ intent better than a complicated mix.

Google’s ranking logic for Adwords evolved overtime, keeping a careful balance so far. But how far can it goes? Will a dynamic logic that mixes the three in significantly different way be a disruptive technology one day?

Your thoughts?

July 28, 2008

The sad reality of today’s business

Filed under: Advertising, business strategy, Datarology, Uncategorized — Tags: , — Huayin Wang @ 4:16 pm

The sad reality of today’s business has something to do with analytics, and technology in general for that matter, in a bit of twisted way.

July 27, 2008

When data floods, analytics is Noah’s Ark.

Filed under: Business, Datarology, Uncategorized — Tags: , — Huayin Wang @ 3:48 am

get on it fast!

September 25, 2007

the skill levels in data analytics

Filed under: business strategy, Datarology — Tags: , , — Huayin Wang @ 10:43 pm

Data analytics is a collection of disparate techniques and applications covering practically every fields and every industries. What holds it together as a coherent discipline is the skill set of the data analyst: the intrinsic structure, levels and connection logics of the skill components that ultimately define and delineate what data analytics currently is and will be in the future.

Data analytics is not a mature discipline. Naturally, there are widely shared confusions about what data analytics is, and particularly what are the skills and level of skills. The lack of common understanding in this has negative consequences on talent search, training and education, project management etc.

One such misunderstanding originated from mixing-up of data analytics skills with subject domain knowledge. Subject domain knowledge are things accumulated through experiences, memorized information and practices related to the subject matter. Information and insights are stored in the brain and can be readily queried without relying on external data, although originally may come from working with data. As a contrast, data analytics skills are skills of extracting intelligent information from data fresh on the spot. Without the explicit provision of data, data analytics will results in no knowledge! Taking out data is like taking out the fuel for data analysts. Comparing this Data-Driven process, the former can be called Grey-Cell-Driven process. Data are useful food for data analytics. Without data analytic skills, data are useless, just like gasoline are useless for a bicycle.

There are varying skill levels in data analytics. For starter, there are roughly 4 levels of data analytic skills:

  1. basic
  2. reporting
  3. professional
  4. expert

At level 1, basic analytic skills are mostly obtained from education and experience. It consists of comparing numbers (big/small, high/low, bigger/smaller), calculating percentage/fraction/ratio/index, reading pie-chart, bar-chart, and understanding two-way tables without relying others to translate into words. Use of excel is optional, but in general, most are able to put data into spreadsheet and do some arithmetic calculations. It does not require any programming skill.

At level 2, reporting analytic skills are generally acquired through working experience. This level includes primarily data analysis skills using excel, or analytic tools that can dump data into excel. It includes the use of formula, the use of numeric and text functions, excel macro, selection of some of the more advanced skills including pivot table, VlookUp, Regression, VBA, Solver etc. The data analytical process of breaking down and aggregating up, trending and graphing are also belong to this level. They understand the concepts of data table or dataset, where records as row and fields as columns, records subseting and filtering, some ways to measure the strength of the relationship between fields …

At level 3, the hallmark of professional data analytics skills is the ability to not only extract information but also evaluate the reliability of the extracted information. In other words, it consists of skills to extracting intelligent information, rather than just information. It also includes a much expanded set of knowledge extraction skills. At the core of it: sampling theory and experimental design, regressions and decision tree models, model development process and common validation principles, basic types of statistical distributions, significant level and p-value, distribution models of 3 basic types of fields (numeric, ordered, categorical) and proper estimation of relationship between fields of different types. Modeling and algorithm knowledge, the use of software/tools and programming languages are intrinsic to professional data analysts.

At level 4, expert data analytics are generally hard to define. Like tree branches, the higher they are the more split they are, both in directions and in varying levels. The one thing that I noticed is their sensitivity and awareness of all explicit and implicit assumptions behind the algorithms used and the general conclusions. Of course, there are many narrower data analytic fields and niches, one could be an expert in one and not in others.

It is also worth to mention that there are a few skills that related to but not part of the data analytics; among other things, it includes making an analogy, generating pretty charts or animating graphs, and last but not the least of all: the skills of selling and promoting data analytics.

Decision Theory and Data Analytics

Filed under: Datarology — Tags: , , — Huayin Wang @ 7:56 pm

Data analytics is the core technology used by businesses today. This is mainly due to the increasingly availability of data and the important of data-driven decision-making process in business.

The basic elements of a decision making process are:

  • the set of choice or options
  • the set of outcomes, corresponding to the above options
  • a valuation of outcomes

Decision-making is about making a choice (or selecting an option) that make sense in light of the valuation of outcomes. Without going too crazy with extra assumptions, such as rationality of the decision agent, the above components allow us to analyze the decision-making process. The simplest decision-making case is when there is only one option (in other words, no choice).

In general, there are four types of decision making:

  1. decision making under certainty (the outcome for each choice is known)
  2. decision making under risk (the probabilities of more than one outcomes for each choice are known)
  3. decision making under uncertainty/ignorant (the possible set of outcomes is known, but not the probabilities)
  4. decision making in interactive context (game theory, gaming context)

with everything above prepared, known, and fully specified, a rational decision making will be reduced to an optimization process, with the exception of case 4. This is not to say it is simple, in fact, many optimization processes in real world can be exceedingly difficult.

Optimization technique is at the core of decision making; it is also the center piece technology of data analytics.

In my professional life as an analytics consultant, I have found this basic conceptual framework very valuable. Whenever a new business problem arise, I often start looking for the core decision making problem. The subsequent steps are, in turn: figuring out the set of all possible choices, what are the constraints which, combined with above, gives a feasible choice set), and the outcome measures or project objectives (from which valuations are derived).

What is so valuable about the framework is not that it ultimately gives a formal setup of the problem; more often than not, there are no clear answers to any of the above questions. Instead, it is the process of trying to clear things up that often helps uncover blind spots and missed opportunities that might otherwise be overlooked.

Much of the Modern Decision Theory is, above and beyond its conceptual frame, quite irrelevant to the actual decision making in the real world. The things that get skimmed over, abstracted out, and cut off before it become a well specified optimization problem are often the real issues for (good) decision making; and it is in dealing with these things that data analytics plays a big role. Data analytics help better decision making by:

  • reducing risk and uncertainty associated with options using predictive modeling, and
  • expanding set of feasible options
  • making optimal choice possible through the use of efficient algorithms

September 24, 2007

three ways of seeing Data Analytics

Filed under: Datarology — Tags: , , — Huayin Wang @ 4:06 pm

There are three different ways of looking at and speaking about Data Analytics: the application-centric, the algorithm-centric, and the data-centric.

Data Analytics is essentially applying a process or algorithm to a set of data to draw intelligent information for a practical purpose. The three ways of looking at Data Analytics are natural reflections of its three key components: data, algorithm and application. Although the essential subject matter can be the same, the day-to-day manifestations of Data Analytics can seem bewilderingly different; nowhere is this felt so acutely as when one is working in data analytic consulting.

The client talk (or “needs” talk, as we sometime call it) refers to the “churn” model, the mailing “response” model, etc. in a way that is naturally application-centric. The statistician talk, or the “techie” talk, is all about the algorithm: logistic regression, robust regression, support vector machine, etc. It is absolutely amazing sometimes just to listen and observe how the two camps communicate, debate, and argue, and how all this often amounts to very little of substance. I wonder how much emotion and saliva could be saved by knowing this difference (There are three different ways of looking at and speaking of Data Analytics: the application-centric, the algorithm-centric, and the data-centric.

Such differences are not limited to the words they use, but reflect contextual and directional differences. To an app-centric perspective, a mailing response model is a mailing response model, it does not matter what algorithm is used: logistic regression, neural network, decision tree, or SVM, etc. To a modeler, a logistic regression is different from a neural network, even though both might be used in many different applications: churn/attrition model, win-back model, fraud prediction model, etc.. When instructed to figure out the best modeling strategy, the decision processes of different camps—those of the business/marketing people and those of the statisticians and data miners—work quite differently. Neither side is rightly equipped to think about this with the breadth and depth needed.

The data-centric perspective provides yet another angle. Common to the app-centric and alg-centric is the idea of a purpose, a thing to predict or find out. In contrast, the focus of the data-centric perspective is solely on data. Given a piece or a set of data, what are ALL the things that analytically can be done to it? This is a pure data analytic perspective where the data elements are in their most abstract forms and, at the same time.  This is also a wide-open perspective, conducive to and capable of providing high-level generative and creative strategies.

If you have a data table, with all numeric fields, what can you do with it? What are all the analytic measures for measuring the “relationship” between two numeric fields? character fields? a numeric and an ordered categorical field? What can you do with a numeric field and a LARGE categorical fields with millions of unique values? two LARGE categorical fields?

Data-centric is new and rarely used. It is also the most interesting and greatly needed at this stage of development.

Three attributes, three perspectives, a pair of eyes. Even with all these, a single great mind is sometimes still the most needed thing to solve challenging problems.

« Newer PostsOlder Posts »

Create a free website or blog at WordPress.com.