Securing big data

November 2013

Securing big dataBig data can create business value by solving emerging business challenges. However, big data also creates security challenges that need to be considered by organisations adopting or using big data techniques and technologies says Mike Small, FBCS CITP.

There is now an enormous quantity of data in a wide variety of forms that is being generated very quickly. However, the term big data is as much a reflection of the limitations of the current technology as it is a statement on the quantity, speed or variety of data.

The term big data needs to be understood as data that has greater volume, variety or velocity than can be comfortably processed using the technology that you already have.

Big data comes from a number of sources both internal and external. Many organisations have accumulated large amounts of data that they are not exploiting. There is an even larger amount of data that is held in publicly available sources, like government databases, social media, as well as data that organisations would be willing to share.

In addition the inbuilt instrumentation of smart systems generates a massive amount of as yet untapped data. To realise its potential value big data needs to be transformed into smart information1, which can then be used to improve planning and increase efficiency as well as to create new kinds of products.

Information security challenges

The underlying information security challenges of malice, misuse and mistake apply equally to big data. Big data techniques2 can also be used by criminals to improve their exploits, provide insights that facilitate security breaches and aggregate data to assist with identity theft.

Big data can be misused through abuse of privilege by those with access to the data and analysis tools; curiosity may lead to unauthorised access and information may be deliberately leaked. Mistakes can also cause problems where corner cutting could lead to disclosure or incorrect analysis. The Cloud Security Alliance has published a report3 on the top ten big data security and privacy challenges.

There are three major risk areas that need to be considered:

Information life cycle: big data turns the classical information life cycle on its head. There may be no obvious owner for the data to ensure its security. What will be discovered by analysis may not be known at the beginning. The provenance of the data may be doubtful, the ownership of the data may be subject to dispute, the classification of the information discovered may not be feasible until after analysis. For all of these reasons the compliance requirements and controls needed cannot easily be predetermined.

Data provenance: big data involves absorbing and analysing large amounts of data that may have originated outside of the organisation that is using it. If you don’t control the data creation and collection process - how can you be sure of the data source and the integrity of the data? How do you know that you have the right to use the data in the way is being planned? These points are brought out very clearly in a UK report on the use of smart metering of power consumption by utility companies4.

Technology unknowns: the technology that underlies the processing of big data was conceived to provide massively scalable processing rather than to enforce security controls. While this is not a new phenomenon in the IT industry there has not been sufficient time for the inherent vulnerabilities and security weaknesses to become manifest.

Information stewardship for big data

Taking care to look after property that is not your own is called stewardship. Information stewardship is not a new term; it has been in use since the 1990s and covers the wide range of challenges involved in managing information as a key organisational asset. These include the management of the whole information life cycle from ownership to deletion as well as aspects like business value, data architecture, information quality, compliance and security.

The basic objectives of information security for big data are the same as for normal data being to ensure its confidentiality, availability, and integrity. To achieve these objectives certain processes and security elements must be in place. There is a large overlap with the normal information security management processes, however, specific attention is needed in the following areas:

Everyone is responsible

The unstructured nature of big data means that it is difficult to assign the responsibility to a single person. Everyone in an organisation needs to understand their responsibility for the security of all of the data they create or handle.

Verification of data source

Technical mechanisms are needed to verify the source of external data used; for example digital signatures.

Systems integrity

There needs to be good control over the integrity of the systems used for analysis including privilege management and change control. Be careful to validate conclusions, if you can’t explain why the results make sense they probably don’t. Always build in a way to check, don’t let big data lead you to stupid conclusions.

Secure processing

Measures to secure the data within the analysis infrastructure are needed to mitigate potential vulnerabilities and to secure against leakage. These could include disk level encryption and a high level of network isolation. Big data should be secured in transit preferably using encryption - at least using SSL/TLS. If the cloud is being used to process the big data understand how to verify that this is secured.

Access management

Access to the analysis infrastructure, data being analysed and the results should be subject to proper IAM controls.

Audit

There should be logging and monitoring of activities on the analysis infrastructure to allow proper auditing.

The key risk areas for big data are the changed information life cycle, the provenance of the big data and technology unknowns. The basic objectives of information security for big data are the same as for normal data, however, special attention is needed to ensure controls for these key risk areas

 
Mike presented an in depth look at this subject at the BCS IRMA (Information Risk Management and Assurance) Specialist Group meeting in London in October 2013.

Image: iStock/167341619

Comments (1)

Leave Comment
  • 1
    Chris Munroe wrote on 18th Nov 2013

    You mention encrypting big data in transit. Where is the big data going to be travelling? And if you encrypt, how is that going to affect the performance of big data processing? If these data (petabytes) are encrypted at rest and in transit, performing complex queries will require decryption first. So what level of encryption would be applicable? All too often it is dead easy to state the obvious (one should encrypt) but then all the practicalities and challenges are not even hinted at.

    Report Comment

Post a comment

Blueprint for Cyber Security

Our vision is a world properly protected from cyber threat. This blueprint sets out how we can deliver that solution, starting in health and care.