Analytics Strategist

September 24, 2007

three ways of seeing Data Analytics

Filed under: Datarology — Tags: , , — Huayin Wang @ 4:06 pm

There are three different ways of looking at and speaking about Data Analytics: the application-centric, the algorithm-centric, and the data-centric.

Data Analytics is essentially applying a process or algorithm to a set of data to draw intelligent information for a practical purpose. The three ways of looking at Data Analytics are natural reflections of its three key components: data, algorithm and application. Although the essential subject matter can be the same, the day-to-day manifestations of Data Analytics can seem bewilderingly different; nowhere is this felt so acutely as when one is working in data analytic consulting.

The client talk (or “needs” talk, as we sometime call it) refers to the “churn” model, the mailing “response” model, etc. in a way that is naturally application-centric. The statistician talk, or the “techie” talk, is all about the algorithm: logistic regression, robust regression, support vector machine, etc. It is absolutely amazing sometimes just to listen and observe how the two camps communicate, debate, and argue, and how all this often amounts to very little of substance. I wonder how much emotion and saliva could be saved by knowing this difference (There are three different ways of looking at and speaking of Data Analytics: the application-centric, the algorithm-centric, and the data-centric.

Such differences are not limited to the words they use, but reflect contextual and directional differences. To an app-centric perspective, a mailing response model is a mailing response model, it does not matter what algorithm is used: logistic regression, neural network, decision tree, or SVM, etc. To a modeler, a logistic regression is different from a neural network, even though both might be used in many different applications: churn/attrition model, win-back model, fraud prediction model, etc.. When instructed to figure out the best modeling strategy, the decision processes of different camps—those of the business/marketing people and those of the statisticians and data miners—work quite differently. Neither side is rightly equipped to think about this with the breadth and depth needed.

The data-centric perspective provides yet another angle. Common to the app-centric and alg-centric is the idea of a purpose, a thing to predict or find out. In contrast, the focus of the data-centric perspective is solely on data. Given a piece or a set of data, what are ALL the things that analytically can be done to it? This is a pure data analytic perspective where the data elements are in their most abstract forms and, at the same time.  This is also a wide-open perspective, conducive to and capable of providing high-level generative and creative strategies.

If you have a data table, with all numeric fields, what can you do with it? What are all the analytic measures for measuring the “relationship” between two numeric fields? character fields? a numeric and an ordered categorical field? What can you do with a numeric field and a LARGE categorical fields with millions of unique values? two LARGE categorical fields?

Data-centric is new and rarely used. It is also the most interesting and greatly needed at this stage of development.

Three attributes, three perspectives, a pair of eyes. Even with all these, a single great mind is sometimes still the most needed thing to solve challenging problems.

Blog at WordPress.com.