There is certainly a large amount of noise at the moment regarding big data, especially around what it can do, its challenges and how it could change the world for the better.
However, like most new concepts and ideas, one has to maintain a certain amount of suspicion around any new technology idea. This is because a) new ideas often have a large amount of hype and therefore under-deliver; b) people cannot see anything wrong with new idea and tend to overlook its shortfalls and c) people often jump on the bang wagon and ‘re-badge’ other ideas as the one, typically for commercial reasons.
This article investigates what big data is, what it can be used for and the challenges with its implementation.
What is big data?
If one were to search the internet, you would likely find hundreds, if not thousands, of different definitions of big data. However, the following three trends seem to underpin most definitions:
- There is a massive volume of data. While size and volume are often relative to circumstances, we are talking in the range of millions of data items, often with hundreds of data variables within each data item.
- The data is constantly changing; often at a rapid pace. New items are being added, updated and removed quickly.
- Finally, the data is stored in a variety of different formats. This will cover the more ‘traditional’ pre-defined structured database formats but also a wide range of unstructured formats, such as videos, audio recordings, free format text, images, social media comments, etc.
Once this data is collected, then it is possible to undertake various forms of analysis. This analysis will find patterns, trends, themes and correlation between variables. This analysis can then be used to explain historical behaviours as well as to predict and shape future behaviours. A few simple examples are listed below is illustrate this point:
- Governments obtain insights to help them with healthcare analysis.
- Meteorologists can use big data to predict and understand weather conditions.
- Political parties can utilise big data to understand voting intentions.
- Medics can try to understand the cause and spread of diseases. This will allow preventative measures to be implemented. (Very topical at the time of writing in regard to the COVID-19 pandemic.)
- Finally, big data can help with the ‘normal’ functions of a business. For example, cost/profit management, marketing / product management, improving the clients’ experience and internal process efficiencies.
In fact, big data can be used to efficiently monitor, analyse and predict trends in most areas of life.
What are the challenges and issues with using big data?
'Big data is not a silver bullet and there are challenges with implementing it successfully. A poor implementation of a big data project will cause more problems than it solves.'
Before an organisation attempts to implement or use big data, then (like any change), it needs to have a clear business reason which is linked to the organisation’s strategy. This will ensure senior management buy-in and a clear focus on what needs to be implemented. It would also be advisable to perform some sort of cost / benefits analysis to understand whether the benefits outweigh the costs, stress and challenges of implementation.
Six of the main implementation challenges are detailed below:
- The term is often misunderstood and misused. While it is often very easy to be sceptical, it is true that some firms will often use big data to cover a wide range of data analysis techniques because they feel using the ‘more trendy’ term will generate more business for them. Therefore, the first rule of thumb for big data is to ensure that you are actually using big data.
- The sheer challenge of processing a vast amount of constantly changing data across many differing and incompatible formats. A complex (and no doubt expensive) stack of technology will be required to continually retrieve the data, interpret it, store it and then analyse it. Therefore, before an organisation embarks on, or implements, a big data project, it is important the firm fully understands the costs, overheads and complexity of this technology. This should be covered in the aforementioned cost / benefits analysis.
- Data security and its set of legal rules is a complex issue at the best of times. However, its complexity will increase dramatically when used with big data, especially if data is gathered and processed across international boundaries. Therefore, organisations need to be aware of the rules and ensure they have policies and processes in place to comply with them.
- As in any new discipline or speciality, there is a large shortage of genuinely skilled and experienced individuals in big data. There are many people who will pass themselves off as data scientists, data miners or big data specialists - but care needs to be taken when employing people to ensure they have the skills and experiences required. Therefore, it is important that firms clearly define what skills, capabilities and experiences are required when trying to recruit big data ‘experts’.
- Like all data analysis or research techniques, there is the risk of inaccurate data. This could be due to a) the data sources being separate and not linked together properly (such as purchasing habits not being linked to geographical locations); b) the data being of poor quality; c) the data being gathered over a poor sample size, which means the results could be biased and / or d) the data being gathered is misunderstood by the data analysis team.
These problems are exaggerated by the size of the data, its constant changing nature and the differing formats. Therefore, like any data analysis or research project, it is important the organisation is fully aware of any data inaccuracies so assumptions, warnings or even disclaimers can be noted against any analysis produced.
- Finally, there could also be issues when processing or analysing the data. There could be errors in the algorithms employed, the wrong variables could be measured or people may simply misinterpret the outcomes provided. Again, this will be exaggerated by the size of the data, its constantly changing nature and the differing formats. Therefore, when performing big data analysis, organisations need to fully analyse the data across multiple algorithms so the data is assessed through several lenses in order to obtain the most rounded view. Also, any material issues with the analysis should also be clearly stated.
The dark side of big data
Finally there is a dark side of big data. As mentioned earlier, big data techniques allows one to predict and change people’s behaviours. While this is not necessarily a bad thing (because it could help with disease prevention) but this technique could be used to change people’s behaviours for somebody else’s own personal needs. For example there have been various documented examples where big data techniques have been used to change people’s voting intensions.
How will General Data Protection Regulation (GPDR) impact big data?
GDPR is a new piece of EU regulation that went live 25 May 2018. Its purpose is to give individuals control over their personal data when used by organisations. Failure to comply could result in organisations being fined up to 4% of annual turnover or €20 million depending which is higher. As a result, organisations have had to implement governance frameworks to comply. (It is important to note that non-personal data is out of scope).
While the long term impact on big data is unclear, it is safe to say there are immediate challenges. Organisations are investigating approaches to ensure they obtain the benefits of big data but comply with GDPR. For example (a) anonymising personal data (b) only holding personal data for the minimum period required to process (c) only collecting minimum the data attributes required, (d) including privacy notices to clearly state what the data is being used for and (e) ensuring data is collected by 'opt-in’ only.
What is the future of big data?
Big data definitely has a massive future going forward and will no doubt provide a great benefit to society. However it is important that one does not underestimate the implementation challenges posed, the regulatory risks as well as the dark side of big data.