Information systems have always extracted data from the everyday world, converting them to an electronic format that can be stored. This creates a valuable asset to be processed and analysed to produce useful outputs. There is a cost to building such systems, but sometimes an even larger cost is actually getting the data into the new system - usually by people actually typing at a keyboard.
There are also lots of electronic files (often spreadsheets) scattered across desktops and servers in large organisations such as the NHS, the value of which would be enormously enhanced if they combined to provide a coherent and consistent picture of the state of the organisation at a point in time. With the NHS, this means ‘big data’ processing - the NHS deals with a million patients in some way or another in every 36 hours.
The NHS - like many other large corporations - employs among its 1.35 million staff some whose job is to take this raw data and convert it into a uniform and coherent format. This is the starting point for the detailed analysis used to improve managerial and clinical decision-making.
While this work requires skill and expertise, it is no surprise that much of it is very repetitive and as such is ripe for automation. NHS England, with its technical partner PA Consulting, took up this challenge and this was the subject of the presentation to PROMSG in June.
A striking aspect of the project was the use of Scrum. The project was completed in five months by means of a series of two- to three-week Scrum sprints. The first sprint was able to deliver useful functionality in just four weeks.
The project required a team with a range of skills including data analysts familiar with the underlying data structures, and developers with expertise in a variety of technologies needed for various levels of data processing. The need for intense communication between the two roles made this a text book situation to use agile.
Individual ETL scripts (using the Talend from Amazon web services) had to be written to convert to a common format source data derived via channels and media as different as web services, FTP and even email attachments. The standardised data was then stored in the cloud using GoogleBigEnquiry. Front end access to the data was provided using QlikView - also from Amazon.
Sense could be made of a potentially chaotic situation by a focus on the standard NHS performance indicators. These are well-defined, and are essentially imposed from the centre. For each indicator current clerical processes could be identified as the sources of the information needed for the indicator values. This is then the basis for the automation efforts. These indicators became the basic units of work for the project. Planning was to a large extent focused in prioritising the indicators to be automated.
The project was undeniably a success. An interesting question is whether the approach could have been profitably used on the Universal Credits system which also involves the merging of data from different sources. One difference between the two examples is that while the NHS systems were complex, the underlying clerical processes and data were well-established. A project where this is not the case might be dealing with another layer of difficulty.