There is now an enormous quantity of data in a wide variety of forms that is being generated very quickly. However, the term big data is as much a reflection of the limitations of the current technology as it is a statement on the quantity, speed or variety of data.
The term big data needs to be understood as data that has greater volume, variety or velocity than can be comfortably processed using the technology that you already have.
Big data comes from a number of sources both internal and external. Many organisations have accumulated large amounts of data that they are not exploiting. There is an even larger amount of data that is held in publicly available sources, like government databases, social media, as well as data that organisations would be willing to share.
In addition the inbuilt instrumentation of smart systems generates a massive amount of as yet untapped data. To realise its potential value big data needs to be transformed into smart information, which can then be used to improve planning and increase efficiency as well as to create new kinds of products.
Information security challenges
The underlying information security challenges of malice, misuse and mistake apply equally to big data. Big data techniques can also be used by criminals to improve their exploits, provide insights that facilitate security breaches and aggregate data to assist with identity theft.
Big data can be misused through abuse of privilege by those with access to the data and analysis tools; curiosity may lead to unauthorised access and information may be deliberately leaked. Mistakes can also cause problems where corner cutting could lead to disclosure or incorrect analysis. The Cloud Security Alliance has published a report on the top ten big data security and privacy challenges.
There are three major risk areas that need to be considered:
Information life cycle: big data turns the classical information life cycle on its head. There may be no obvious owner for the data to ensure its security. What will be discovered by analysis may not be known at the beginning. The provenance of the data may be doubtful, the ownership of the data may be subject to dispute, the classification of the information discovered may not be feasible until after analysis. For all of these reasons the compliance requirements and controls needed cannot easily be predetermined.
Data provenance: big data involves absorbing and analysing large amounts of data that may have originated outside of the organisation that is using it. If you don’t control the data creation and collection process - how can you be sure of the data source and the integrity of the data? How do you know that you have the right to use the data in the way is being planned? These points are brought out very clearly in a UK report on the use of smart metering of power consumption by utility companies.
Technology unknowns: the technology that underlies the processing of big data was conceived to provide massively scalable processing rather than to enforce security controls. While this is not a new phenomenon in the IT industry there has not been sufficient time for the inherent vulnerabilities and security weaknesses to become manifest.
Information stewardship for big data
Taking care to look after property that is not your own is called stewardship. Information stewardship is not a new term; it has been in use since the 1990s and covers the wide range of challenges involved in managing information as a key organisational asset. These include the management of the whole information life cycle from ownership to deletion as well as aspects like business value, data architecture, information quality, compliance and security.
The basic objectives of information security for big data are the same as for normal data being to ensure its confidentiality, availability, and integrity. To achieve these objectives certain processes and security elements must be in place. There is a large overlap with the normal information security management processes, however, specific attention is needed in the following areas:
Everyone is responsible
The unstructured nature of big data means that it is difficult to assign the responsibility to a single person. Everyone in an organisation needs to understand their responsibility for the security of all of the data they create or handle.
Verification of data source
Technical mechanisms are needed to verify the source of external data used; for example digital signatures.
Systems integrity
There needs to be good control over the integrity of the systems used for analysis including privilege management and change control. Be careful to validate conclusions, if you can’t explain why the results make sense they probably don’t. Always build in a way to check, don’t let big data lead you to stupid conclusions.
Secure processing
Measures to secure the data within the analysis infrastructure are needed to mitigate potential vulnerabilities and to secure against leakage. These could include disk level encryption and a high level of network isolation. Big data should be secured in transit preferably using encryption - at least using SSL / TLS. If the cloud is being used to process the big data understand how to verify that this is secured.
Access management
Access to the analysis infrastructure, data being analysed and the results should be subject to proper IAM controls.
Audit
There should be logging and monitoring of activities on the analysis infrastructure to allow proper auditing.
The key risk areas for big data are the changed information life cycle, the provenance of the big data and technology unknowns. The basic objectives of information security for big data are the same as for normal data, however, special attention is needed to ensure controls for these key risk areas
Mike presented an in depth look at this subject at the BCS IRMA (Information Risk Management and Assurance) Specialist Group meeting in London in October 2013.