There’s little doubt there’s gold to be had in those data mountains, with Gartner predicting through 2015, organisations integrating high value, diverse new information sources and types into a coherent information management infrastructure will outperform industry peers financially by more than 20 per cent1.
However, mining unstructured data is intensive, not just in terms of resource but also in terms of the processes that need to be put in place to ensure big data analysis delivers. In all the excitement, few have considered the risks involved and how best to secure data once it has been extracted.
Big data potentially exposes an organisation to many of the threats associated with traditional infrastructures but with far higher stakes. Familiar issues, from malware and data theft to the problem of privacy and compliance, become amplified by the distributed nature of big data. Many organisations don’t have the capabilities to deal with such colossal volumes in-house and are turning to the cloud for on-demand storage and access.
With such large volumes spread across distributed architectures, securing data becomes more problematic as the attack surface increases. The cloud has also had its own issues to deal with, from accessibility and data ownership to availability, with recent examples including the AWS outage and Dropbox hack. Put the two together and it seems obvious that security should be a prime concern.
Of course, all that glitters is not gold. The value of big data comes from the data that can be extracted to inform future business strategy and it’s that extraction process that has dominated recent thinking. The amount of useful data is expected to rise in line with data volumes, with IDC predicting that the digital universe will reach 40 zettabytes (ZB) by 2020, resulting in a 50-fold growth from the beginning of 2010.
Needless to say, a concentrated mass of extracted data immediately becomes more appealing to attackers and is likely to lead to some sophisticated security attacks on ‘data banks’.
The security challenge for big data lies in providing an effective security model across the life cycle of the process without impeding the 3Vs (volume, variety and velocity) or compromising the rest of the information estate.
Collecting the amount of data an organisation deems as useful, allowing the right parts of the business to access it, at the right time across the organisations geographic foot print, whilst considering local legislative and sector specific requirements must be perfectly orchestrated. Get this wrong and the organisation’s reputation could be destroyed or provide attackers or competitors with access to critical data.
Technology has focused on mass-scale processing of unstructured data, with debate centring on how best to slice and dice the results. Little consideration has been given to how to integrate big data with existing process without compromising the business.
Of course, you can’t pragmatically implement security controls if you don’t know what you have. An understanding of value of the unstructured data has to take place first, followed by some in-depth thinking about how to store and access such large data sets securely.
Business process modelling (BPM) enables organisations to look at big data end-to-end rather than in isolation. By examining current processes, it becomes possible to map these out and determine gaps, explore how existing data processes could be changed, the complex issues this may cause and how these can be addressed. Only once BPM has been initiated should business analysis (BA) be considered (BA can be part of BPM but should never precede it).
Together, BPM and BA methodologies can be used to quantify unstructured and other data and ensure it becomes known. There is little point in focusing security resource on data that is not valuable. Determining exactly what data is at the organisation’s disposal, including that which already exists, can enable an organisation to cut out the fat. This makes it possible to avoid sanitising data sets that no longer have any use, such as duplicates, expired or redundant data.
Having analysed the data, it becomes necessary to determine the data life-cycle. What will be collected, who will access it, for what purpose? Working with the relevant parts of the business makes it possible to understand how combinations of data can push the sensitivity beyond the organisation’s capability.
Data in isolation may not necessarily be initially identified as sensitive but, once processed or combined with other data sets, an organisation could end up handling information assets it cannot deal with securely or that puts it at risk of prosecution in specific geographic locations.
Naturally, compliance has to be a part of this process and these will vary according to sector and markets the organisation operates in. Yet all too often regulations such as the Data Protection Act (DPA) can be underestimated or forgotten in the quest to deliver the business strategy.
The big data security estate will differ architecturally and operationally from today’s traditional infrastructure. There won't be a one-size-fits-all approach to security, but there will be commonalities.
It makes sense to apply or extend some existing security solutions, such as controlled access and authentication procedures, monitoring and logging practices and data protection practices.
Yet the key to securing big data effectively is flexibility so it’s vital not to over-secure. The right amount of security needs to be applied so as to not impede the ‘velocity’ element and it is advisable to monitor choke points in the process design to ensure this isn’t compromised.
Organisations transitioning to the cloud need to consider how best to design, develop and migrate data and consider how services will be managed. It then becomes relatively straightforward to bake in security at the relevant stages. An assurance and audit framework can ensure security controls are put in place to protect data and provide compliance with security standards such as ISO 27001 as well as observing DPA requirements, for example.
Taken together, these elements can provide the right blend of data capture, processing, extraction and security for big data to deliver. It has the potential to be a gold mine of business insight, but bear in mind there are plenty of other prospectors who will use any means necessary to obtain those nuggets of data.
1 Gartner, Information Management in the 21st Century, Regina Casonato, et al., September 2, 2011