GIGO (garbage in/garbage out) is one of the oldest and pithiest acronyms in IT. Reportedly coined by George Fuechsel, an IBM 305 RAMAC technician/instructor in New York, the term became widely used in the early 1960s and was emblematic of the fledgling IT industry. But the concept of GIGO is just as relevant today as it was half a century ago or perhaps even more so.
Why is that the case? Because in the burgeoning world of big data, the value of analyses intimately depends on the quality of the material analyzed. Consider it this way: Access to a large, even limitless amount of building materials can enable you to build a bigger house. But if the timber is rotten or the foundation stones are flawed, the entire structure is undermined.
That being the case, it seems odd that in the fervent enthusiasm around big data the subject of data quality governance doesn’t come up more often. It’s not rocket science by any means—understanding and defining data standards, and then consistently managing and cleansing information to meet those standards are common practices in traditional data warehouse and business intelligence efforts.
It may simply be that big data and the maintenance/management requirements of unstructured and semi-structured information assets are still evolving. But it also seems clear that without effective stewardship and governance, big data results and end users’ trust will suffer. Modern technologies may allow and encourage the analysis of increasingly complex, ever larger volumes of information. However, if data quality management is ignored and resulting insights risk ending up on the garbage heap.