33rd John Perry Prize

Professor Julia Hippisley-Cox: the Open Pseudonymiser tool

Professor Julia Hippisley-Cox has won the John Perry prize for her work developing the Open Pseudonymiser tool.

The Nottingham University Professor of clinical epidemiology and general practice created the tool to enable pseudonymisation at source and secure linkage of difference datasets without re-identification. It is free and open source and used by GP system suppliers Emis and TPP as well as the Health and Social Care Information Centre and the National Office for Statistics.

The prize was set up by John Perry with profits from his OXMIS GP coding system to recognise outstanding contributions to primary care computing. Entries are assessed by a panel of judges from the BCS Primary Health Care Specialist Group, which presented the award at its annual general meeting last Thursday night.

Professor Hippisley-Cox told EHI Primary Care that she hoped the prize would raise the profile of Open Pseudonymiser and its potential uses.

“As an academic I’ve written lots of papers, but sometimes you do things and think ‘gosh, this is one of the most important things I have done’ and this is it,” she said of the development of the tool.

“I feel hugely privileged to have got the award.”

Professor Hippisley-Cox is a cofounder of Qresearch, a not-for-profit partnership between Nottingham University and EMIS, which has 750 GP practices regularly contributing information to its database. The Open Pseudonymiser tool was developed because she wanted to link the GP data in QResearch with data held in other systems, such as Hospital Episode Statistics. Funding from QResearch’ fund for public benefit projects was used to develop it and the tool was released in September 2011.

Open Pseudonymiser works by taking the NHS number and using a password to replace it with an identifier for each patient that is unique, but has no real-world meaning so cannot be reverse engineered. The same password key can be applied to a range of data sources, allowing them to be linked without the need for the flow of confidential patient information. The tool rounds the date of birth to the year of birth and strips out any other identifying data such as postcode.

Professor Hippisley-Cox said QResearch has already used the tool to link 15-years of primary care data with hospital statistics, the cancer registry and mortality statistics. She has also produced a paper looking at the use of a valid NHS Number in these datasets and found all were more than 98% complete. “The question then becomes, if you have a unique identifier that’s so complete across these major data sources, what’s the justification for using anything apart from the NHS number to link datasets together?” she asked.

She believes a potential use is with the government’s new care.data programme, which involves extracting a large monthly data set from GP practices and linking it with HES and datasets from other areas such as community and social care. Current plans are for the GP data to be extracted in identifiable form and pseudonymised within the ‘safe haven’ of the Health and Social Care Information Centre. However, this is concerning many GPs, who have responsibilities as data controllers to protect the patient information they hold.

Professor Hippisley-Cox argues that pseudonymising at source would solve these problems, as the Data Protection Act does not apply if the data being extracted is not identifiable.

This approach would also allow historical GP data to be extracted, creating a much more complete and useful dataset than the current extraction which will only go back to April 2013, she said.

View Julia Hippisley-Cox's presentation