Small and Exact V/s Big and Messy Data Analysis
Autor: Dhruv Gulati • November 15, 2015 • Essay • 463 Words (2 Pages) • 1,052 Views
Messiness of Data: Data traditionally is seen as a structured and organized resource. For years, Marketers have spent a large part of their time in organizing data in to clear cut segments. Sampling was seen as an effective tool to gauge exact sentiment of the desired population. Generations of Business Professionals spent a large part of their professional life introducing precision to their collection. However, a new science of decision making makes it ubiquitous to collect all data. It cares less about the exactitude and more about the quantity of data.
Small and Exact v/s Big and Messy : When Microsoft wanted to launch a grammar check tool for its flagship Ms. Word, they fed it with a large amount of unstructured data- Millions of words, Thousands of lines of sentences. The results were astounding, as the amount of data fed into the system increased, the effectiveness of the algorithm kept on increasing too. Microsoft was initially considering the possibility of using a library for the same purpose. While, the level of exactitude guaranteed by a library of inputs would have been high, scaling up would have been an issue. Even in product design, clean taxonomies are being replaced by messier structures. Eg: Use of tags in Social Media (No clearly defined organization, rather flexible segmentation of inputs received via tag)
“I am, therefore I can” – Causation “I may, therefore I will” – Correlation
Causation v/s Correlation: A correlation is a mathematical fit between two items/ events while a causation is a link of dependence of one item on the other. Target, a retail giant developed a prediction on when a women was pregnant based on her shopping behaviour. While, traditional marketing approach would rely on “Why”, a data centric approach just concentrates on “What”. Target too used a “What” centred approach when it used a bundle of products as a proxy to probabilistically determine whether a woman was pregnant or not. They could gather insights about their consumer market faster and could design a personalized marketing mix for each of the consumer. Coupons were send to specific segments based on the “pregnancy score” developed by their algorithm. With the help of Big Data, Target could specifically identify the sweet spot in consumer life stage.
...