Analytics Strategist

January 2, 2010

a decade in data analytics …

Filed under: misc, Web Analytics — Tags: , , , — Huayin Wang @ 10:53 pm

I was reading an article The Decade of Data: Seven Trends to Watch in 2010 this morning and found it a fitting retrospective and perspective piece.  I have been working in data analytics for the past 15 years, so naturally I went searching for similar articles with more of a focus on analytics, but came back empty handed 😦

I wish I could write a similar post, but feel the task is too big to take.  A systematic review with vision into the future would require much more dedication and effort than I could afford at this point.  However, I do have a couple of thoughts and went ahead to gather some evidence to share.  I’d love to hear your thoughts; please comment and provide your perspectives.

The above chart shows search volume indices for several data analytics related keywords over the last six years.  There are many interesting patterns.  The one caught my eyes first is the birth of Google Analytics: Nov 14, 2005.  No only did it cause a huge spike in the search trend for “analytics”, the first day “analytics” surpass “regression”, it become the driving force behind the growth of web analytics and analytics discipline in general.  Today, more than half of all “analytics” searches are associated with “Google Analytics”.  Anyone who writes the history of data analytics will have to study the impact of GA seriously.

I wish I could do a chart on the impact of SAS and SPSS on data analytics in a similar fashion, but unfortunately it is hard to isolate SAS searches for statistics software vs other “SAS” searches.  When limited to the “software” category, it seems that SAS has about twice the volume of SPSS, so I used SPSS instead.

Many years ago, before Google Analytics and the “web analyst” generation, statistical analysis and modeling dominated the business applications of data analytics.  Statistician and their predictive modeling practice were sitting in their ivy tower.  Since the early years of the 21st century, data mining and machine learning became a strong competing discipline to statistics – I remember the many heated debates between statistician and computer scientists about statistical modeling vs data mining.  New jargons came about, such as decision tree, neural network, association rule and sequence mining.  To whomever had the newest, smartest, most math grade, efficient and powerful algorithm went the spoils.

Google Analytics changed everything.  Along with data democratization came the democratization of data intelligence. Who would’ve guessed that today, for a large crowd of (web) analysts, analytics would become near-synonymous with Google Analytics and building dashboard, tracking and reporting the right metrics the holy grail of analytics?  Those statisticians may still inhabit the ivy tower of data analytics, but the world is already owned by others – the people – as democracy would dictate.

No question about it, data analytics is trending up and flourishing as never before.

comments?  Please share your thought here.

February 24, 2009

attribution problem is a data analytics problem

Is there anyone out there as frustrated as me with the many different terms and concepts around “attribution”?  For those who haven’t thought about this yet, here’s a sample of the terms related to the discussion:  

attribution management, attribution protocol, attribution problem, multiple touch point attribution, online marketing attribution, multiple attribution protocol, attribution modeling, marketing mix modeling, last-touch attribution, equal-attribution, impression attribution, attribution theory, online-offline attribution, attribution rules …

In this and a few follow up posts I will discuss a few topics that I hope will bring some clarity to this.  

Let me be upfront with my main point: attribution problem is a data analytics problem. I know that few people would argue with me on this, but I think few people have taken this seriously all its implications.  

Since it is fundamentally a data analytics problem, we should start with data.  What is the underline business question requiring an attribution analytical solution? What kinds of data we have, or we need to have, to answers attribution question?  How to translate the business questions into a data analytics questions that match the type of data we have.  What questions are not answerable given the limitation of data, or available analytics tools?  How rigorous is the proposed data analytics strategy:  a heuristic, a rule of thumb, a well-specified model? Are we over or under in our use of data?  Are we over design the analytics and making it more complex than necessary? What are all the limitations and disclaimers associated with an approach?

My sense is that we have not taken a serious look of the attribution problem from a data analytics perspective yet. We know the business problems, but most of us are not expert in data analytics methodology. 

Any comment?

Please come back to read my next post on micro and macro attribution.

March 28, 2007

BS in data analytics

Filed under: Datarology — Tags: , — Huayin Wang @ 5:57 pm

I am utterly disgusted of Bull Sh**ing in Business Intelligence and Data analytics world. this is not to say that I think BS appears less in other areas, just that my biology responds more to what happens in my world.

It is hard to get a collection of all the BSs I have seen, for I choose not to remember them.

One I saw this morning:

“XXX helps YYY achieve 100m in annual online sale.”
— how much help is there?

March 26, 2007

Trading Analyatics

Filed under: Datarology, Technology — Tags: — Huayin Wang @ 7:41 pm

What is the right analytics for trading?

I believe this is a wrong question.
(Of course, it make sense in context, like everything else. But for those truly seeking, it is a pre-requisite for it not to be limited by context, implicit or explicit).

What’s the most important contextual factors that’s been ignored? The data available.

This apply to all data analytics. In fact, one of the blind spot of the field of data analytics is the linkage between data and the form of data to analytic techniques. What’s been emphasized before are the linkage between problem and analytics.

[Data(/Form)], [problem], [and analytical technique] are the three pillars of the field.

It is not just the “problem statement” that define the context of data analytics, it is the combination of “problem statement” and the data availabilility and the form of it that co-define the analytical context.

So we have to begin with the data components before we start talking about trading analytics. And we know most people start with the wrong foot when they just assume that they are only to be using the same data that everyone using. Finding out critical data to use is such an important thing is trading analytics today, more important than what analytical techniques to choose from.

March 21, 2007

I made up the word: datarology

Filed under: Datarology — Tags: , — Huayin Wang @ 5:53 pm

My confession to all: I have a habit of making up new words, some of them are interesting while others felt like dried up “luo bo”. I do not know which one datarology belong to.

I was actually worked hard to come up with this word. Being in the business of data mining/statistical modeling/pattern recognition/AI for over a decade, I am so tired of the akward situation I am in whenever people ask me what I am doing.

What’s my profession, really?

I feel funny also, when I read job specifications, it really run wild: statistician, data miner, data mining researcher, predictive modeler, research scientist, pattern recognization scientists, data analysts etc. Deep in my heart, I know exactly what they are looking for, but it just hard to put it in word. If this is not a situation needing a new word, what will be?

What would you call someone who understand how to get the crucial intelligent/reliable knowledge out of data; and who know all the tricks and traps of working around often messy and unfriendly data, in their many forms, someone who understand the how the size/dimention/the bias in the data can impact what you can do and draw out of it?

People from statistics call them statistician, but it is only part of their skills. Computer science label them as data miner, but to me “mining” is hardly the most appropriate analog for the thigns they do. They can be as sophisticated as the most challenging science and need to have sense as practical salesmen. What would call them, and what would you call this discipline?

It is such a growing field, much like the computer programming 20/15 years ago: many companies begin to find that they need someone in this profession that they never dream to. Now everyone get data, a lot of data, in fact too much data. Once the need to have database to host them is satisfied, what do you do with the data become the question.

10 or even 5 years ago, who would think that having too much data could be a problem? And how much does that changed! Yet, many still believe that one just need to look and think to get the most out of their data, waiting to be bitten once, twice, over and over until they realize that they need someone who have this special knowledge and skills.

What about numerology? Numerology is about number, while datarology is about data. There is one contrast though: numerology is about reading much out of little, at least on the surface. Datarology, on the other hand, is about reading little out of much data. These are about as much sense as I can make out of this word.

How this could possibly helping anyone for anything? Well, it could if one day you see a job ad that day: A level-3 datarologist wanted!

May 11, 2006

The Coming of Age of Datarologist

Filed under: Datarology — Tags: , — Huayin Wang @ 3:22 pm

The name Datarologist was born quietly, with no fanfair, announcement or celebration.
Before we have datarologist, it was called many names: statistician, data analyst, data miner, modeler, etc. and the art of trade is called similarly: statistics, data mining, pattern recognization, artificial intelligence, informatics …

The common part that they all share isn’t incidental. On the contrary, the flourish of all of the different fields are pretty much driven by This common, shared element: the skills, the knowledge, the art of trade that able to read relevant insights from data.

子曰: “名不正则言不顺;言不顺则事不成”
(“If names be not correct, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success.” — Analects by Confucius, Chapter 13 )

I smile when reading job reqirement on KDD website. The titles are so varied while the skills are so similar: ) I often felt the pain of the writers. They could have just write: Opening position for a level 5 datarologist!

Blog at