Big data vision

Adam Davison MBCS CITP asks whether big data means big governance.

For the average undergraduate student in the 1980s, attempting to research a topic was a time consuming and often frustrating experience.

Some original research and data collection might be possible, but to a great extent, research consisted of visit to a library to trawl through text books and periodicals.

Today the situation is very different. Huge volumes of data from which useful information can be derived are readily available - both in structured and unstructured formats - and that volume is growing exponentially. The researcher has many options. They can still generate their own data, but they can also obtain original data from other sources or draw on the analysis of others.

Most powerfully of all, they can combine these approaches allowing great potential to examine correlations and the differences. In addition to all this, researchers have powerful tools and technologies to analyse this data and present the results.

In the world of work the situation is similar, with huge potential for organisations to make truly informed management decisions. The day of the ‘seat of the pants’ management is generally believed to be on the way out, with future success for most organisations driven by two factors: what data you have or can obtain and how you use it. However, in all this excitement, there is an aspect that is easy to overlook: governance.

What structures and processes should organisations put in place to ensure that they can realise all these possibilities? Equally importantly, how can the minefield of potential traps waiting to ensnare the unwary be avoided? Can organisations continue to address this area in the way they always have, or, in this new world of big data, is a whole new approach to governance needed?

What is clear is that big data presents numerous challenges to the organisation, which can only be addressed by robust governance. Most of these aren’t entirely new, but the increasing emphasis on data and data modelling as the main driver of organisational decisions and competitive advantage means that getting the governance right is likely to become far more important than has been the case in the past.

Questions, questions

To start with there is the question of the overall organisational vision for big data and who has the responsibility of setting this? What projects will be carried out with what priority? Also one has to consider practicalities - how will the management of organisational data be optimised?

Next we come to the critical question of quality. Garbage in, garbage out is an old adage and IT departments have been running data cleansing initiatives since time immemorial. But in the world of big data, is this enough? What about the role of the wider organisation, the people who really get the benefit from having good quality data?

There is also the issue that a lot of the anticipated value of big data comes not just from using the data you own, but from combining your data with external data sets. But how do you guarantee the quality of these externally derived data sets and who takes responsibility for the consequences of decisions made based on poor quality, externally derived data?

Although garbage in more or less guarantees garbage out, the opposite is not necessarily true. There are two elements involved in turning a data asset into something useful to the organisation; good quality data and good quality models to analyse that data. As was clearly demonstrated in the banking crisis, however, predictive models rarely give perfect results.

How therefore can organisations ensure that the that the results of modelling are properly tested against historic data and then re-tested and analysed against real results so the models and the data sets required to feed the models can be refined and improved?

Above all, how can organisations ensure that the results of analysis are treated with an appropriate degree of scepticism when used as a basis for decision-making?

Confirmation bias

Also, when considering how such models are used, the psychological phenomenon of confirmation bias needs to be considered; the human tendency to look for or favour the results that are expected or desired. Inevitably analysis of data will sometimes give results that are counterintuitive or just not what was looked for, leading to the age old temptation to dismiss the results or massage the figures. What policies and processes are needed to ensure that this doesn’t happen?

Another important governance issue is around how to protect the valuable data. The information security threat is constantly evolving and as big data becomes the critical driving force for many organisations, the risk of having their data asset compromised or corrupted becomes acute. Great clarity on who is responsible for managing this issue and how it is managed will be critical.

So, when starting to consider all these issues, the most fundamental question is; where should responsibility for these issues lie? Generally speaking, four options tend to present themselves:

The CIO as the person responsible for managing the data asset;
The person or people who get the benefit from the data asset;
With a neutral third party;
A mixture of the above.

As things stand, in many organisations, the CIO is the default answer. After all, the ‘I’ in CIO stands for information, so surely this should be a core responsibility? This approach does have some justification.

CIOs are often the only people who have an overall understanding of what data, in total, the organisation owns and what it is used for. Also, the CIO tends to have practical responsibility for many of the issues listed above such as IT security (not quite the same as information security, however) and data cleansing (not quite the same as data quality).

However, the CIO typically has responsibility for managing the data. Is it therefore appropriate that he/she should also own the governance framework under which this data is managed? Furthermore, CIOs tend to have a wide range of responsibilities, so their ability to give sufficient focus to data/information governance could be limited. Finally, CIOs may not be ideally positioned when it comes to influencing behaviours across the organisation as a whole.

Responsibility with the user?

For many, having overall responsibility for data governance resting with the users, the people who gain benefit from the data, is an appealing concept. They are, after all, the people who have most to lose if good governance isn’t applied. Again, however, there are downsides to this.

Only in the relatively small organisation will it be practical for the user side to be represented by a single individual. More frequently, one runs the risk of ending up with a sort of governance by committee, with a range of stakeholders each with their own viewpoints. In this scenario, the chance of a consistent and appropriate governance model being created and such a model being successfully applied are very limited.

Faced with these issues, some organisations have chosen to take a third way and create the post of chief data officer (CDO): someone who has overall responsibility for organisational data but who sits outside of either (usually) IT or the end-user communities.

This approach is in many ways attractive. It means that overall governance responsibility rests with someone who is able to focus themselves entirely on the issues related to data (not the case with either the CIO or the user community) and who can take an entirely neutral viewpoint when setting rules on how such data is managed, and used.

However, issues again emerge. The CDO concept can be undermined by the question of organisational authority to ensure that the decisions that they make are binding, particularly as CEOs, already under pressure from multiple directions for increased senior level representation, will naturally be reluctant to create yet another C-level role.

Finally there is the hybrid approach, for example sharing governance responsibility between the CIO and the users or putting a CDO in place to report to the CIO or a senior user figure such as a COO. It is certainly true that all significant stakeholder groups will need to be involved at some level in ensuring good governance around data. However, this again brings in the issues around governance by committee and unclear overall responsibilities.

Any of the above models could work, but ultimately, which of them will work is most likely to be highly influenced by the nature of the organisation. In an organisation with a very strong cooperative culture, the hybrid approach might be the one to choose.

Last but not least, giving this important responsibility to an individual with the right experience and personality can be seen as being at least as important as their job title. Give the job to the right person and the chances are it will get done, give the job to the wrong person and the chances are it won’t. What remains true in all cases, however, is that this issue will become more and more important and addressing it successfully is going to be of vital importance for all organisations.