Analytics Strategist

July 27, 2008

When data floods, analytics is Noah’s Ark.

Filed under: Business, Datarology, Uncategorized — Tags: , — Huayin Wang @ 3:48 am

get on it fast!

September 25, 2007

the skill levels in data analytics

Filed under: business strategy, Datarology — Tags: , , — Huayin Wang @ 10:43 pm

Data analytics is a collection of disparate techniques and applications covering practically every fields and every industries. What holds it together as a coherent discipline is the skill set of the data analyst: the intrinsic structure, levels and connection logics of the skill components that ultimately define and delineate what data analytics currently is and will be in the future.

Data analytics is not a mature discipline. Naturally, there are widely shared confusions about what data analytics is, and particularly what are the skills and level of skills. The lack of common understanding in this has negative consequences on talent search, training and education, project management etc.

One such misunderstanding originated from mixing-up of data analytics skills with subject domain knowledge. Subject domain knowledge are things accumulated through experiences, memorized information and practices related to the subject matter. Information and insights are stored in the brain and can be readily queried without relying on external data, although originally may come from working with data. As a contrast, data analytics skills are skills of extracting intelligent information from data fresh on the spot. Without the explicit provision of data, data analytics will results in no knowledge! Taking out data is like taking out the fuel for data analysts. Comparing this Data-Driven process, the former can be called Grey-Cell-Driven process. Data are useful food for data analytics. Without data analytic skills, data are useless, just like gasoline are useless for a bicycle.

There are varying skill levels in data analytics. For starter, there are roughly 4 levels of data analytic skills:

  1. basic
  2. reporting
  3. professional
  4. expert

At level 1, basic analytic skills are mostly obtained from education and experience. It consists of comparing numbers (big/small, high/low, bigger/smaller), calculating percentage/fraction/ratio/index, reading pie-chart, bar-chart, and understanding two-way tables without relying others to translate into words. Use of excel is optional, but in general, most are able to put data into spreadsheet and do some arithmetic calculations. It does not require any programming skill.

At level 2, reporting analytic skills are generally acquired through working experience. This level includes primarily data analysis skills using excel, or analytic tools that can dump data into excel. It includes the use of formula, the use of numeric and text functions, excel macro, selection of some of the more advanced skills including pivot table, VlookUp, Regression, VBA, Solver etc. The data analytical process of breaking down and aggregating up, trending and graphing are also belong to this level. They understand the concepts of data table or dataset, where records as row and fields as columns, records subseting and filtering, some ways to measure the strength of the relationship between fields …

At level 3, the hallmark of professional data analytics skills is the ability to not only extract information but also evaluate the reliability of the extracted information. In other words, it consists of skills to extracting intelligent information, rather than just information. It also includes a much expanded set of knowledge extraction skills. At the core of it: sampling theory and experimental design, regressions and decision tree models, model development process and common validation principles, basic types of statistical distributions, significant level and p-value, distribution models of 3 basic types of fields (numeric, ordered, categorical) and proper estimation of relationship between fields of different types. Modeling and algorithm knowledge, the use of software/tools and programming languages are intrinsic to professional data analysts.

At level 4, expert data analytics are generally hard to define. Like tree branches, the higher they are the more split they are, both in directions and in varying levels. The one thing that I noticed is their sensitivity and awareness of all explicit and implicit assumptions behind the algorithms used and the general conclusions. Of course, there are many narrower data analytic fields and niches, one could be an expert in one and not in others.

It is also worth to mention that there are a few skills that related to but not part of the data analytics; among other things, it includes making an analogy, generating pretty charts or animating graphs, and last but not the least of all: the skills of selling and promoting data analytics.

April 11, 2007

How to classify your mind?

Filed under: Random Thoughts — Tags: , , — Huayin Wang @ 5:56 pm

The mind is so intangible that it is hard to describe a specific instance of it, not to say classifiy. Unlike body, where you can say that there are: gender, tall / short, fat / slim, pretty etc.

What about mind?

March 23, 2007

What is Data, really?

Filed under: Datarology — Tags: , , — Huayin Wang @ 8:40 pm

Common definition found on the web, all share similar construct:

Factual information, especially information organized for analysis or used to reason or make decisions. ( (Webster is similar)

There are many versions of it that define Data using the word “fact” or “factual information”. This is unacceptable. For data is itself carry no assertion about the quality of it, whether it is fact or not is an after fact as long as the definition of data is concerned.

Using “information” to define data is not proper either, for whether data is information or not is relying on the users and how users understand the data: data is more “primitive” than “information”, not the other way around.

I like the following better, although I am not perfectly happy about it: Data is a structured form consisting of datum. I like it because it does not imply any implicit relationship between its explicit forms and the external world. It does not say limit its structure to any kinds, table, row, collection, independent observations etc. are all artificial frame, not general enough to be considered in the definition. It also does not imply the present of any external knowledge, or preprocessing routines or any specialized observers.

Give me some example, you ask. First of all, the simplest data example is a datum – the atom as far as data is concerned. Because it is datum, itself should not have any sub-structure, so this is saying that it can have nothing but a name or a label. As to the form of datun, it really does not matter, as far as it is looked at as simplest data. Datum can have name and value.

Next, data can be a collection of observations (datum).

Next, datum can have attributes which “describe” observations. An example of it will be “continuous”, “discrete”, “ordered” etc. Attribute may have name and value as well.

What we called “data table” is just one common form of data. Other forms of data include: network data, transaction data, graph data, time series, text data.

“All these are common sense”, you said. “What’s new?”

Well, all the common data analytics are analytics on “Table-like” data. The analytics for other forms of data are so much behind. This is a problem, this is an opportunity.

Blog at