March 21, 2007

I made up the word: datarology

My confession to all: I have a habit of making up new words, some of them are interesting while others felt like dried up “luo bo”. I do not know which one datarology belong to.

I was actually worked hard to come up with this word. Being in the business of data mining/statistical modeling/pattern recognition/AI for over a decade, I am so tired of the akward situation I am in whenever people ask me what I am doing.

What’s my profession, really?

I feel funny also, when I read job specifications, it really run wild: statistician, data miner, data mining researcher, predictive modeler, research scientist, pattern recognization scientists, data analysts etc. Deep in my heart, I know exactly what they are looking for, but it just hard to put it in word. If this is not a situation needing a new word, what will be?

What would you call someone who understand how to get the crucial intelligent/reliable knowledge out of data; and who know all the tricks and traps of working around often messy and unfriendly data, in their many forms, someone who understand the how the size/dimention/the bias in the data can impact what you can do and draw out of it?

People from statistics call them statistician, but it is only part of their skills. Computer science label them as data miner, but to me “mining” is hardly the most appropriate analog for the thigns they do. They can be as sophisticated as the most challenging science and need to have sense as practical salesmen. What would call them, and what would you call this discipline?

It is such a growing field, much like the computer programming 20/15 years ago: many companies begin to find that they need someone in this profession that they never dream to. Now everyone get data, a lot of data, in fact too much data. Once the need to have database to host them is satisfied, what do you do with the data become the question.

10 or even 5 years ago, who would think that having too much data could be a problem? And how much does that changed! Yet, many still believe that one just need to look and think to get the most out of their data, waiting to be bitten once, twice, over and over until they realize that they need someone who have this special knowledge and skills.

What about numerology? Numerology is about number, while datarology is about data. There is one contrast though: numerology is about reading much out of little, at least on the surface. Datarology, on the other hand, is about reading little out of much data. These are about as much sense as I can make out of this word.

How this could possibly helping anyone for anything? Well, it could if one day you see a job ad that day: A level-3 datarologist wanted!

September 29, 2006

random thought 1

Representation is everything in Datarology.

September 11, 2006

random thought 2

Rakesh’s Data Mining Definition

An Expansive Definition of Data Mining (Rekesh Agrawal, KDD06):
Deriving value from a data collection by studying and understanding the structure of the constituent data.

This is the closest in meaning to what I have in mind for “Datarology”. I particular like the part of the definition where he used “understanding …” instead of “analyzing …”, because I believe Synthesizing is an equally important activity for Datarology as analyzing.

May 11, 2006

The Coming of Age of Datarologist

The name Datarologist was born quietly, with no fanfair, announcement or celebration.
Before we have datarologist, it was called many names: statistician, data analyst, data miner, modeler, etc. and the art of trade is called similarly: statistics, data mining, pattern recognization, artificial intelligence, informatics …

The common part that they all share isn’t incidental. On the contrary, the flourish of all of the different fields are pretty much driven by This common, shared element: the skills, the knowledge, the art of trade that able to read relevant insights from data.

子曰: “名不正则言不顺;言不顺则事不成”
(“If names be not correct, language is not in accordance with the truth of things. If language be not in accordance with the truth of things, affairs cannot be carried on to success.” — Analects by Confucius, Chapter 13 )

I smile when reading job reqirement on KDD website. The titles are so varied while the skills are so similar: ) I often felt the pain of the writers. They could have just write: Opening position for a level 5 datarologist!

