Gerry Reilly FBCS, Technologist in Residence and Susheel Varma FBCS, Director of Engineering at HDR UK, share the build process for the new HDR UK Innovation Gateway and explain why open source matters.

Preparing for the launch of the National Health Service, the then Prime Minister Aneurin Bevan wrote in The Lancet on the 3 July 1948: ‘...Quite the most ambitious adventure in the care of national health that any country has seen.’

Over 70 years on, we are now on the cusp of transforming the lives of patients and the public through the ethical and safe use of health data for research. In this article, we will discuss how Health Data Research UK (HDR UK) has been at the forefront of driving that transformation, particularly how the Health Data Research Innovation Gateway is providing unprecedented insight into the UK’s rich landscape of health data assets, which can support innovative research in academia, industry and the NHS.

Health Data Research UK

HDR UK was established in 2018 to be the national institute for data science in health. Its mission is to make transformative improvements to the health of patients and through data science research and innovation.

By undertaking research at scale, across a population of up to 66 million people, we can engage the health data research community to deliver on an unrivalled opportunity to use data with the highest ethical standards, to drive pioneering breakthroughs in medical research. This unleashes the potential to improve how we can prevent, detect, diagnose and treat diseases such as cancer, heart disease and asthma.

HDR UK brings together our unique nationwide health data assets with specialists across academia, industry and healthcare, to unlock knowledge and deliver new insights from molecule to public health. We aim to achieve this by delivering a three-component health data research infrastructure - the UK Health Data Research Alliance, the Health Data Research Hubs and the Health Data Research Gateway.

The digital innovation hubs programme

In the 2017 Life Sciences Sector Deal, the government committed to delivering on the Life Sciences Industrial Strategy’s recommendation to create Digital Innovation Hubs (now known as Health Data Research Hubs) to support the use of data for research purposes, operating with trusted and secure governance. HDR UK was appointed to deliver the programme.

Designing the programme

Rather than immediately embarking on the selection of the Hubs, we decided to go on the road with a design and dialogue phase to hear the views of researchers, data custodians and the public, to identify what would it take to deliver on the vision of the Life Science Industry Strategy.

Between September 2018 and April 2019, we ran events throughout the four nations of the UK including London, Cardiff, Glasgow, Southampton, Exeter, Manchester, Newcastle, Nottingham and Belfast, engaging with potential users and refining the concepts for the programme.

To test out some of the concepts, we also initiated an open selection for Sprint Exemplar projects, eventually selecting ten projects which were successfully delivered.

This design and dialogue phase led us to the three-component delivery model:

  • UK Health Data Research Alliance, convening the leading data custodians to share and develop best practice across the UK.
  • Health Data Research Hubs, to deliver access to curated datasets with deep expertise.
  • Health Data Research Innovation Gateway, to provide a common portal for the discovery, access request management, collaboration and the analysis for datasets.

We will now focus on the development and delivery of the Gateway.

Development principles

From the start, we decided that this was not an academic research project and must be delivered as a production service platform.

It was decided that the development should be made in the open, taking requirements from a wide range of users and testing these openly with early drivers. As this work is publicly funded, we also ensured that we shared all development outputs openly and available for reuse, including designs, prototypes and code.

Our approach is encapsulated in our development principles:

  • Patients, the NHS and public-centred - with the infrastructure development focused on delivering benefits to patients, the NHS and the wider public, building confidence and trust in how data is accessed and used in research and innovation. All development will actively engage patients, the NHS and the public throughout.
  • User-design led. The experience for our user community is paramount. All development will follow best design practices, actively engaging the user communities and be developed in the open to enable continuous user feedback. The design will be delivered as ‘design as code’ to demonstrate working capability.
  • Agile development. Adopt agile development processes and tools to deliver exemplars quickly and build on that experience. At each stage, test our approach based on how well it supports our research and innovation user community.
  • Open first. Support open standards throughout our delivery. Use open-source where possible and share HDR UK developed technology assets openly under the MIT licence, using the HDR UK GitHub repository.
  • Modular and extensible. Requirements and tools will evolve. All capability will be integrated through open and documented APIs.
  • Cloud-first. Deploy to public cloud by default; falling back to on-premises private cloud, only where compute or storage makes this significantly more cost-effective.
  • Reuse. Work with open source and our academic and commercial partners to enable reuse whenever possible. New development should be the exception, not the default.
  • Build in the ‘ities’ from day one. Architect for interoperability, scalability, reliability, availability and security throughout the development. This is a production environment and not a research project.

From concept to minimum viable product

Following the comprehensive input from the design and dialogue phase, we embarked on agile testing of the core concept of the Gateway by building a minimum viable product (MVP). A pre-existing analogous platform, Health Data Finder, had previously been built by PA Consulting (funded by the National Institute for Health Research (NIHR)). The experience from their work also informed the objectives and design of the MVP.

In keeping with HDR UK’s open philosophy, we sought development partners through an open procurement process, which involved HDR UK, external stakeholders from the NHS and the public in the final selection. The selected partners were IBM, to develop the front-end portal, The University of Oxford, to provide a metadata catalogue, and MetadataWorks, to provide expertise and support for onboarding of metadata (the Gateway holds metadata about datasets, with the datasets themselves remaining with the data custodians). Work commenced at the start of October 2019.

HDR UK and MetadataWorks led activity to define a draft specification for the metadata. The IBM and Oxford teams started rapidly driving the MVP as a public cloud offering, with all new code development shared on GitHub. The MVP was publicly available with support from the members of the UK Health Data Research Alliance. By the end of the MVP project in February 2020, we had curated summary metadata for over 400 datasets.

Building out an MVP proved that the fundamental concept of a portal to support discovery and access request management was valid. With support from the UK’s data custodians, it would be possible to onboard sufficient content to provide a useful platform. However, this was only an MVP; it showed that so much more was both possible and required if we were to start really changing the world of health data research. Anticipating this, we commenced the second phase of procurement to identify a technology partner to work with us to co-develop the Gateway into a production platform.

MVP to reality

We adopted an open procurement approach (which attracted a lot of attention) and then whittled the responses down to a short list of three teams, led by IBM, EPAM and PA Consulting. These teams were then set a tight, eight week (four two-week sprints) rapid development task to build a key component of the Gateway. As far as possible, these were treated as real agile projects with the expected ceremonies, user workshops and even changing requirements!

The final selection was made by a panel that included not just HDR UK, but external representation and patient and public participation. The selection was based on both what was created and on the collaborative approaches of each team. In the end, PA Consulting was selected to co-develop with HDR UK over the next two years, consisting of four, six-month milestones up to April 2022.

In late April 2020, work started on the first six-month milestone. During this milestone, the team worked to two-week sprints, with a public release of the Gateway every other sprint. This allowed for open development and regular stakeholder feedback.

The experiences from the MVP and the rapid development task caused us to refocus our ambitions significantly, as we realised that previous related works such as the NIHR sponsored Health Data Finder (which had focused on just datasets) missed the opportunity for insight from cataloguing related health research tools, projects, people and more recently educational courses together with these datasets. So, the focus shifted to providing a discovery experience that would work for a broad set of health data research entities and not just datasets.

The approach

The approach was design-led, with regular workshops, twice-weekly design sessions and user testing, ensuring that the user experience was central to the Gateway from the start. For the first milestone, the team focused on three capability areas:

  1. Search. Ensuring that datasets, tools, etc., were discoverable and that their inter-relationship was clear. This led to the development of new concepts that could group entities of any type into collections.
  2. Improved data access request management. This was identified as the most significant pain point for the stakeholders; therefore, with the members of UK Health Data Research Alliance, we worked to provide a more harmonised and efficient approach for data access request management. The access control remains with the data custodians, to ensure that access is only provided for legitimate research purposes. We are now well into process improvement and more will come in the next six months.
  3. Collaboration. We wanted the Gateway to become the place for the UK health data community to go for collaboration. We built in the ability to collaborate through an embedded forum to allow users to add comments and reviews around datasets and to self-contribute information around new tools, projects, etc.

As we approached the end of 2020, we had produced a high-performance Gateway. One that is now already supporting active research that makes a difference to health outcomes in the UK.

COVID-19 impact and HDR UK’s response

Just as we started to kick off the final selection activity and really get going with the main development, the COVID-19 pandemic was upon us.

As the national institute for health data research, this necessitated a pivot in priorities and a massive acceleration of effort. HDR UK became central to the UK’s response to COVID-19, establishing a trustworthy, national approach to using health data, drawing on the full capabilities of the UK heath data research community. This helped with understanding the virus, clinical trials for treatments (including Dexamethasone), symptom trackers, risk calculators and impacts on vulnerable groups, including cancer patients.

The Gateway development pivoted to support this work, with the rapid onboarding of relevant datasets, tools and importantly COVID-19 research projects. This allowed public visibility of the research projects that were being proposed and prioritised. The Gateway also became a key part of the (initially weekly, now fortnightly) COVID-19 reports submitted by HDR UK to SAGE.

As with many other organisations, the pandemic also forced a transformation on our ways of working. We reshaped our design approach and agile delivery with PA Consulting to work remotely. This worked far better than any of us could have predicted.

Were there occasions when it would have been better to have been in a room? Yes, for certain, but with planning, consistent cadence and great collaboration, the team delivered - despite being remote and needing to accelerate to meet the demands of the first pandemic of the digital age.

Lessons learnt so far

We have learnt many things over the last two years; three things stand out:

  1. Consult widely. The design and dialogue phase, sprint exemplar projects, MVP, active involvement of our stakeholders and, importantly, our public advisory board, radically changed the shape of what we have delivered.
  2. Code talks. By delivering rapidly and making early drivers publicly available, we have been able to get early feedback and engagement across the health data research community.
  3. Agile. If 2020 showed us anything, it was that we need to be ready for the unexpected. Starting with agile and open at the heart of our approach allowed us not only to respond to the challenges of COVID-19, but to pivot our work to support the research efforts that were moved to focus on the pandemic. This has helped us further refine the direction for future development.

Most importantly, we have shown that being open matters, not just in outputs but also in approach:

  • Open communication from our design and dialogue phase changed the shape of the programme and this has continued to evolve.
  • Our partners were selected through open procurement, with both external and patient & public participation in selection.
  • Prioritisation of public, patient and practitioners’ involvement and engagement throughout the process from the initial design and dialogue phase to delivery. We believe this is essential to building and retaining public trust in the use of data for research.
  • Openness on use of health data. The metadata for the datasets, papers, projects, etc., on the Gateway are public.
  • Open development. The Gateway has been developed openly, with regular publicly accessible drivers, by open sourcing all newly developed component, with monthly public webinars to discuss what we are doing and solicit feedback.

So, what’s next?

As of April 2021, the Gateway catalogues over 640 datasets (75 of which focus on COVID-19 and are therefore supporting the UK’s researchers in responding to the pandemic) with over 1,000 registered users who have made over 250 dataset access requests.

This is just the start. The development has just completed its second planned milestone. In the next stage, as well as continuing to enhance the user experience, we will be extending the capability to support federated access to Trusted Research Environments and metadata catalogues, cohort discovery and much more.

Come and explore the Gateway at www.healthdatagateway.org, share your ideas and help make the UK the very best place to do ethical research on health data.

A timeline of the Health Data Research Innovation Gateway project.

A timeline of the Health Data Research Innovation Gateway project.(Click for larger image)