Business Issues -> Regulation
By: Clive Longbottom, Head of Research, Quocirca
Published: 8th May 2012
Copyright Quocirca © 2012
If an organisation is sitting on top of 10 databases, each of which is 100TB in size, it has a big data issue, right?
Not necessarily – it certainly has a problem in that it has a lot of data to deal with, but federating databases and applying data cleansing, master data management (MDM) and business analytics can provide a pretty decent solution to this. Big data introduces a lot of different problems – ones that require a bit of different thinking which may take many outside of their comfort zone.
Let’s begin by taking a simple view of information within an organisation. In the dim, dark past when I got into the ITC world, a rule of thumb approach was that around 20% of an organisation’s information was in electronic format, the rest on paper. Of the electronic stuff, about 80% was held within formal databases. Roll the clock forward by a couple of decades and this has essentially flipped – around 80% of an organisation’s information is now in electronic format, and only around 20% of that will be held in a formal database. The rest of the electronic stuff will be held in various file formats dotted around on file servers, personal devices and so on.
Any “big data” approach that just deals with the data held within databases is therefore only using 16% of the available information – not a good way to reach mission critical decisions.
This is further complicated by how information usage has changed. Back at that earlier time, an organisation’s data assets were pretty easy to define – the data was in that database that was on that server in that data centre. Now, the organisation’s information assets have to include shared information across the value chain of customers and suppliers – and then beyond that into the information held in the internet itself and across social networking sites.
All of a sudden, the “big data” approach of federating information across those large databases that the organisation controls is looking a little measly. Even if it is assumed that those databases are large – say a total of 10 petabytes (PB), or close to 1,000 times the amount of information held in the American Library of Congress – the total size pales into insignificance against the volume of information held on the internet, where other information that could be useful could be found in semi-structured or unstructured formats. The current information volume of the internet is estimated to be around 2 zettabytes (ZT) – or 2 million PB. Bringing that into the equation brings that 16% of available information that you may have thought you were acting against down to a very small fraction of a single per cent.
Sure, a lot of the available information out there on the internet is either complete dross or is not germane to the problem you are dealing with. The problem is that some of it is – the views of customers being propagated through the social networks; the performance and activities of competitors; the dynamics of the markets in which you are operating, whether these are vertical or geographic. You need the tools to identify that useful stuff, and then the means to bring it into an environment where it can be analysed and reported against in a manner that allows intelligence to be gleaned from a broader set of sources – in other words, a true big data approach.
A term that is being used around big data sums it all up nicely – it is about volume, velocity and variety. The volume side is the one everyone accepts, but is also the one that vendors have latched on to and focused on. The velocity side is where the big battles seem to be being played out – how fast can one vendor provide insights against this large volume of data that is under focus?
But variety is often glossed over – and yet it is the most important. Less structured information held in documents and spreadsheets, along with information that can be gleaned from less traditional sources such as voice and video and those internet sources alluded to earlier are all potentially relevant. Those who can use the right technologies in order to bring this variety of information sources together such that volume and velocity needs are also met will be the outright winners in world of true big data – those who just look at it as a problem with volumes of structured data under their direct control will face major problems.
For a bit more on this subject, see Quocirca’s argument on why “Big data” should be re-termed as “unbounded information”, here.
We automatically stop accepting comments 180 days after a post is published. If you would like to know more about this subject, please contact us and we'll try to help.
Published by: electronicdawn Ltd.