In the first article in this series I wrote about the various ways in which big data needs to be considered over and above what database you are going to host it on, specifically with respect to governance and management of that data. In the second article I wrote about trust: can you rely on this data for decision-making; and in the third I discussed context: essentially the need to have metadata about your big data. In this article I want to consider a different aspect of trust: namely security.
While there may, on occasions, be issues around company confidential information the big security issue with big data is around data privacy. Once again (this applies to both trust and context) this is by no means unique to big data and facilities such as data masking are required in both big data and conventional data environments when dealing with personally identifiable information.
The big difference with big data is that you are often talking about what was, initially at least, publicly available information on third party platforms over which you have no control. If someone wants to, or is idiotic enough to, post their credit card number on Twitter that is up to them but if you ingest that information into your organisation for analysis purposes then you will be in breach of compliance regulations if that data is not filtered out (that is, you don't ingest it) or otherwise masked or redacted. This means that you need software in place (probably some sort of profiling capability) to recognise such data and to treat it appropriately. In other words, at least if you are processing social media in any way, then you will need appropriate data governance/compliance processes in place.
However, the question of data privacy leads on to another topic, which is ethics - not a topic usually discussed on IT web sites. With the sort of advanced analytic capabilities that are now around, combined with the fact that social media users are only too happy to reveal who they are, it is possible to make deductions about individuals that they might not wish to be publicly known. How should companies approach this? The simple answer is that it is not (usually) illegal to exploit this information but they are opening themselves to the possibility of public opprobrium should they do so and if that fact then gets into the public domain, which would be bad for both their sales and their share price. Thus, while neatly sidestepping the ethics/moral questions raised by this suggestion, I would suggest that such an approach would be counterproductive from a practical point of view. Unfortunately, while the avoidance of this sort of thing should come under the role of corporate (rather than data) governance it is difficult to formulate appropriate rules in any specific sense. So we may deplore this sort of activity but ultimately we can't stop it.