By: Philip Howard, Research Director - Data Management, Bloor Research
Published: 12th August 2013
Copyright Bloor Research © 2013
This is the first in a series of articles I am planning to write about the management and governance of big data. That is, I am going to be concerned with how you ensure that big data - whatever you mean by that - is fit for purpose and usable for business purposes. Conversely, I am not concerned with whether this data is stored in Hadoop or MongoDB or in your data warehouse. And I am not really bothered about what sort of data it is, whether it is machine generated data or social media data, or video or audio, or even if it is transactional data except that different types of data may require different emphases as far as governance is concerned.
Just to clarify this: machine generated data often has lots of duplicated information that you would like to remove, and there may be missing data because a sensor has failed or a network connection has broken, which you would like to access, but the data itself is pretty reliable and it doesn't typically include sensitive data, so you don't need capabilities like data masking or data cleansing. Social media data, by contrast, may certainly hold sensitive data and we are all aware (I hope) of how much data cleansing may be needed with respect to transactional data.
So, the focus for management and governance may be different for different types of big data but the fundamentals are the same. And what are those fundamentals?
I think there are three.
Firstly, you need to be able to integrate your big data systems with other relevant data that resides in your environment. To take what might seem a simple example, if you have smart meters the data that is captured needs to be integrated into your invoicing environment, it will be analysed for capacity planning purposes, you will want to use the data in conjunction with fraud detection systems, you will need to link to service and repair management systems, and so on. It will be very rare for big data to exist in splendid isolation. Moreover, different approaches to integration will be needed in different situations and these may change over time. In other words, the integration environment needs to be very flexible.
Secondly, the data needs to be trustworthy. There are actually two aspects to this: in the first case you need to know that it is secure, especially with respect to personally identifiable information and data privacy; and in the second case, that the data is of sufficient reliability that you can trust it as the basis of your decision making.
Finally, data needs to be understood in context. For example, social media data needs to be understood within the context of CRM or brand management environments. Of course, this isn't much different from any other data that is used for analytics but that is precisely the point: big data needs to be managed and governed just as much as ordinary data does, albeit with some qualifications.
Anyway, those are the issues: I will explore each of them further in the articles in this series.
Posted: 12th August 2013 | By Richard Ordowich :
Governance of data describes the expected behaviors of the constituents who create and use data. When contemplating governance the first thing you need to describe are the principles that govern data. What are acceptable uses of the data? What are the ethics and morals governing the data? What are the behaviors of the constituents necessary for data governance such as trustworthiness?
Without governing principles data is not governed but managed. Data inherently isnât trustworthy. It the provenance of the data that determines it trustworthiness. Is the source of the data trustworthy? Who is the source of the data is more important than the data. Technology does not address governance, people do. Understanding data is a human behavior. Building consensus of meaning requires social interactions. Governance defines how those social interactions occur through defined behaviors such as requiring data literacy.
The three fundamentals for data governance are:
1. Principles for data use
2. Data Literacy
3. Collective accountability and responsibility for data
After these are defined, then the data management best practices can be selected and deployed that implement data governance.
The messages above were all contributed by IT-Director.com readers. Whilst we take care to remove any posts deemed inappropriate, we can take no responsibility for these comments. If you would like a comment removed please contact our editorial team.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.