Technology -> Big Data
By: Wayne Kernochan, President, Infostructure Associates
Published: 4th April 2013
Copyright Infostructure Associates © 2013
I want to start this piece by giving the most important take-away for IT readers: They should take care that data governance does not get in the way of Big Data, and not the reverse.
This may seem odd, when I among others have been pointing out for some time that better data cleansing and the like are badly needed in enterprise data strategies in general. But data governance is not just a collection of techniques—it’s a whole philosophy of how to run your data-related IT activities. Necessarily, the IT department that focuses on data governance emphasizes risk—security risk, risk of bad data, risk of letting parts of the business run amok in their independence and create a complicated tangle of undocumented data relationships. And that focus on risk can very easily conflict with Big Data’s focus on reward—on proactive identification of new data sources and digging deeper into the relationships between the data sources one has, in order to gain competitive advantage.
While there is not necessarily clear evidence showing that over-focus on data governance can impede Big Data strategies and thereby the success of the organization, there is some suggestive data. Specifically, a recent Sloan Management Review reported that the least successful organizations were those that focused on using Big Data analytics to cut costs and optimize business processes, while the most successful focused their Big Data analytics on understanding their customers better and using that understanding to drive new offerings. Data governance, as a risk-focused philosophy, is also a cost-focused and internally-focused strategy. The task of carefully defining and controlling metadata seeks to cut the costs of duplicated effort and unnecessary bug fixes inherent in line-of-business Wild-West data store proliferation. It therefore can constrain the kind of proliferation of usage of new externally-generated data types like social media data that yield the greatest Big Data success for the enterprise.
Who’s to be master?
So, if we need to take care that data governance does not interfere with Big Data efforts, and yet things like data cleansing are clearly valuable, how can we coordinate the two better? I often find it useful in these situations to model the enterprise’s data handling as a sausage factory, in which indescribable pieces of data 'meat' are ground together to produce informational 'sausage'. I like to think of it as having five steps (more or less):
Note that data governance as presently defined appears to affect only the first two steps of this process. And yet, my previous studies of the sausage factory suggest that all of the steps should be targeted, as improving only the first two will only offer minor improvements in a process which tends to 'lose' ¾ of the valuable information along the way, each step losing quite a bit more.
How does this apply to Big Data? The most successful users of Big Data, as noted above, actively seek out external data that is dirty and unconsolidated and yet is often more valuable than the organization’s 'tamed' data. Data governance, as the effective front end of the sausage factory, must therefore not exclude this Big Data in the name of data quality—it must find ways of making it 'good enough' that it can be fed into the following four steps. Or, as one particular database administrator told me, 'dirty' data should not just be discarded, as it can tell us about what our sausage factory is excluding that we need to know.
Data governance should also not, if at all possible, interfere with the four steps following data quality assurance. Widening scope widens security risks; but the benefits outweigh the risks. Information delivery that involves a new data type risks creating a 'zone of ignorance' where database governors don’t know what their analysts are doing; but the answer is not to exclude the data type until that distant date when it can be properly vetted.
Much of this can be done by using a data discovery or data virtualization tool to discover new data types and incorporate them in an enterprise metadata store semi-automatically. But that is not enough; IT needs to ensure that data governance accepts that Big Data exclusion is not an option and that the aim is not pure data but rather the best balance of valuable Big Data and data quality.
In one of the Alice in Wonderland books, a character uses the word “glory” in a very odd way, and Alice objects that he should not be allowed to. “The question is,” the character replies, “Who’s to be master, you or the word?” In a similar way, users of data governance and Big Data need to understand that you, with your need for Big Data customer insights from the outside world, need to be master, not the data governance enforcer.
Posted: 5th April 2013 | By Philip Howard :
I think there are some interesting points here but I also think this is a dangerous argument. My impression is that a lot of people treat big data as if it was the Wild West - anything goes: it is not too much governance that is the danger, it is none at all.
The other thing that slightly worries me about this article is that there seems to be undertone of IT "doing data governance" to the organisation, which I think is the wrong emphasis to take: data governance should be about the collaborative support of corporate data policies and not about imposing strictures on the business because somebody in IT thinks it would be a good idea - and that applies not just to big data but generally.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.