Mike Hall MBCS CITP explains how Nataša Pržulj’s BCS Needham Lecture on data mining has led to a potentially ground-breaking research project that could transform prostate cancer diagnosis and treatment. He wants to get more people involved, from donating to spreading the word.

BCS events are not usually life changing but the Roger Needham lecture at the Royal Society on 19 November last year proved to be just that.

I was quietly sipping coffee before the lecture when Geoff McMullen - Past President of BCS and a Data Mining Project initiator - walked in. As my prostate cancer had recently been diagnosed, ‘How are you?’ led to more than the usual platitudes. To my surprise, Geoff revealed that he had been suffering for a little longer and had taken the time to find out a great deal about the disease. We agreed that something ought to be done to improve diagnosis and treatment.

Prostate cancer is common. Of the one in eight men who get the disease, one in three currently die from it and the increase in life expectancy means more men will suffer. The lecture that followed by Nataša Pržulj from Imperial College on mining biological networks blew us away. If you haven’t watched it, it’s well worth finding the time (see link in footnote). For me, it is one of the very best lectures I’ve ever heard.

The principles of data mining are easy enough to understand. Getting to grips with the detail may require more than one view! There’s a huge amount of depth in the lecture and I’ve felt the need to watch it several times.

Although I spent most of my career working for Shell in IT, I started in medical engineering (see Hall MJ 1969, ‘The Control of Powered Artificial Limbs’, PhD thesis, University College London) and the research bug has remained with me. For nearly 50 years I’d been waiting for a project where computers could make a significant contribution to medicine.

A little reflection after the lecture and a few emails with Geoff and another long-standing friend Tony Axon - Past President United European Gastroenterology and also a Data Mining Project initiator - led us to agree to see if Nataša’s techniques could be applied to prostate cancer and whether she would be interested in taking on the project.

With this background, early in 2015 Geoff, Tony and I met with Nataša at Imperial College to discuss a possible prostate cancer project. We were happy with what we found and Nataša was enthusiastic about doing the research. Nataša developed a proper research proposal over the next couple of months and in parallel we consulted various senior computer and medical research professionals to ask if they thought the project was worthwhile.

Those ‘due diligence’ investigations provided strong support both for the potential value of the research and the competence of Nataša’s Computational Network Biology Unit. Based on the research proposal, we agreed to see if we could raise the £450,000 needed to support two three-year post-doctoral research posts and we started to develop a project website.

To be credible, it seemed essential to be part of a charity. Feelers put out to some of the main cancer charities were not particularly encouraging, but by chance we were put in contact with a smaller one. The Prostate Project charity was established in 1998, initially supporting research and facilities at the Royal Surrey County Hospital and at Surrey University with the by-line ‘giving men a better chance of beating prostate cancer’.

It turned out that their horizons were expanding slightly and after a period of discussion we were delighted when they agreed to incorporate the data mining project. One exceptionally positive consideration is the less than four per cent administrative overheads of The Prostate Project, substantially less than the vast majority of other cancer charities.

As project initiators, Geoff, Tony and I are confident this ground-breaking data mining research project has the potential to transform prostate cancer diagnosis and treatment. Benefits are likely to include improved disease classification and much better targeting of drugs.

We are also determined that the research results will move quickly into clinical practice. Since prostate cancer generally develops slowly, research results are likely to help many of those who already have the disease.

What is more, the research is generic, so the techniques developed can be applied to other cancers and diseases. We’d like to start prostate cancer data mining very soon, but first we need to raise at least £180,000 of the £450,000 target.

Please consider making a donation so Nataša’s BCS Roger Needham lecture results in a tangible contribution to society. And please tell your friends about the project, particularly those with prostate cancer and anyone who is interested in how computers can make a real difference to public health.

For more information and to donate see: https://theprostatecancerproject.net

And finally, easy to forget, but absolutely crucial, if you are a man and especially if you are over 60, insist on an annual PSA test, which is your right. Although not perfect, the test will significantly improve the chances of prostate cancer being detected early, giving you a much better chance of successful treatment. I wish someone had told me that!

See ‘Mining Biological Networks’, BCS Roger Needham lecture at The Royal Society.

Mining Biological Networks

A huge amount of medical research and clinical data has been recorded in a wide variety of independent databases (genetic data, disease progression, protein-protein interaction, drug trial...) including prostate cancer data.

The data in each can be considered as a network of relationships and each pair of networks can be linked if there are data points common to both. The data is incomplete and often noisy with most useful correlations impossible to see by simple inspection.

The challenge is how to mine these molecular networks to answer fundamental questions, including gaining new insight into ageing, diseases and improving therapeutics. Just as computational approaches for analysing genetic sequence data have revolutionised biological understanding, the expectation is that analyses of biological networks will have similar ground-breaking impacts.

However, dealing with network data is non-trivial, since many methods for analysing large networks fall into the category of computationally intractable problems. This problem is now being addressed, with data mining extracting new biological knowledge from the wiring patterns of large molecular network data, linking network wiring with biological function and translating the information hidden in the wiring patterns into everyday language.

This revolutionary computer technology is a golden key enabling us to access vital information lying hidden within large and disparate international databases.

A substantial number of high quality, internet accessible data sources are publicly available (eg protein-protein interactions, gene co-expression networks, drug-target interactions, drug-drug interactions).

Clinical data also offers a variety of related patient-specific data. Single data types have been analysed over many years to give practical results, but concurrent analysis - data mining - of all networked biological data can accurately uncover many more statistically significant correlations, including multiple diseases common to specific genome traits.

A final stage of verification of the computational results is to mine the literature for evidence that the associations exposed have been noticed in clinical practice, or to obtain biological validation. Initial results are really exciting and this line of research is only at its beginning.

This description of data mining is adapted from the ‘mining biological networks’ page on the project website and much of that page was written by Nataša Pržulj.