Open source is good for AI but, is AI good for open source?

Terence Eden CITP MBCS, MIET, Chair of the BCS Open Source Specialist Group, takes Martin Cooper MBCS on a tour of the open source world and explains why writing software is both an art and a science.

Summing up his take on open source’s spirit, Terence Eden says: ‘So, when I publish a piece of open source code, I say at the top of it: “this code is copyright Terence Eden” and it is licensed to you under, maybe, the MIT License. It says, effectively, that you’ve got to tell people that I wrote this code. This is the trade-off we make: you can have this code but, you’ve got to tell people that I wrote it and, when you share it, you’ve got to share it under the same licence so that [the code] stays open… We’re using copyright against itself to make things more collaborative and more friendly.’

Terence is chair of the BCS Open Source Specialist Group, sits on OpenUK’s board and, alongside a long career in software, it’s fair to say he’s passionate about the open source movement.

The future is bright, the future is open

Done right, he explains, open source is a great way of helping people launch IT careers. Elsewhere, it is enabling IT itself to contribute to solving many of the world’s grand scale challenges. Open source is also deeply incorporated into the weft and weave of defining technologies like AI, cloud and the internet itself. It helps encourage efficiency, discourages duplication of effort and, it can be argued, helps forward discovery across the sciences.

But, open source can be, and is, done badly. Code is taken with no acknowledgement of authorship given. Projects based on open source code don’t share their work and instead keep it closed.

So, why open source? ‘Well,’ Terence says, ‘I’ve been fiddling around with computers since I got a BBC Micro when I was tiny. I’ve worked for huge multinational corporations, tiny little start-ups and I now run my own business,’ he says. ‘I tend to spend a lot of my time speaking about open source, open technology and open data.’

Meeting Linux for the first time

‘Thinking back,’ he says, ‘At university, I don’t think I ever heard the words open source. We were told not to share our assignments… Don’t copy code you found on the internet. Of course, we did! It’s something I wish we’d been taught at university. But, it was probably only when I was reading PC Format and they said: there’s this thing called Linux.’

Installing the operating system broke his parents’ computer, but it ignited a spark. ‘Ever since then, I’ve had the idea that people should be in control of their tools. I resent, when I try and do something on a Windows PC or on a Mac, that I can’t do it. Somebody, somewhere has told me “no”. What got me interested in open source was the idea that I could change what the computer was doing and make it work better for me.’

Among the many projects he’s worked on, the most high profile is probably the NHS COVID-19app. The app was part of the UK’s response to the COVID pandemic and would alert users if it detected that they’d been in close contact with another user who’d tested positive for the virus.

‘I was the head of open technology', he explains. ‘So looking at open source for the whole of the NHS. When the pandemic hit, one of the questions was “do we need to make it open?” And, part of my role was to make sure that app became open source. The day the app went in to the general public’s hands, the source code was published on GitHub. It was a huge team effort.’

But, why make the app open source in the first place? The first answer Terence offers is: public money, public code. The public spends its tax pounds on services building products, why shouldn’t they see the code?

‘There’s also an openness and transparency angle,’ he continues. ‘If you remember lockdown and all the controls. If the government suddenly said “we’re going to make you install an app”, people would be upset, frustrated, nervous and understandably cautious.

Government can say “don’t worry, it’ll be fine”. But, the only way you can really address the scepticism is say: “here’s the source code, look through it. There’s no GPS, we’re not selling your data.”’

The limits of openness

The benefits to the public of such a move, Terence admits, are limited as reviewing code is a specialist job - but the app’s open source approach allowed the IT community and privacy experts to make their assessments. It also allowed professionals to contribute to the project in a very important way: spotting bugs. Bugs, he says, inevitably creep into code and by making projects open source you’re inviting a huge number of eyeballs to look at your work, spot errors and report them.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

‘It was never a case of “should we do it?”’ he says. ‘It was always the default assumption that, where reasonable, everything should be published as open source. By reasonable, I mean – we don’t publish our API keys and firewall configurations on GitHub.’

Open source’s natural transparency does, however, have its limits when it comes to AI. He says: ‘Even if [we] open sourced every AI algorithm, you still need a few million quid’s worth of computers to train the model and [expensive] storage to run everything. So, just having the source isn’t always enough. But it's getting there...'

Having access to open source code is, then, helpful to the public and it’s also clearly immensely helpful to software developers too. Latterly however, another species of beneficiary has become apparent: Ais.

Chief among these is, of course, ChatGPT. OpenAI’s chatbot has taken the world by storm thanks to its seemingly effortless and plausible writing. These abilities are taught, learned and founded on the AI ingesting a huge corpus of expert human writing. ChatGPT can only write a sonnet because it’s been fed Shakespeare. It can only produce academic papers because it’s taken in shelves of theses. And it can only ‘write a binary clock programme in Python’ because it has absorbed billions of lines of openly available Python code.

What it produces might sound convincing to the poetry reader’s ear, its essays may pass unmarked by a professor’s red pen and its coded efforts may even work. But, in a world increasingly dependent on software, just working isn’t enough. Software needs to do, and have, so much more. It needs to be trustworthy, predictable, explainable and reliable.

AI meets open source

‘I absolutely foresee a time where you would be able to point an AI at an open source repository and say “make me this but, make something subtly or even majorly different”’, says Terence. ‘But, there are two important issues in this. Firstly, the copyright issue. If you’ve got a piece of [AI] software, which is trained on a billion lines of other people’s code, it’s going to spit out somebody else’s code and it might do that without attribution. And, one of the of things we’re very keen on in open source is correct attribution.’

This lack of attribution, he explains, exposes organisations to layers of risk. Firstly, it’s unhealthy and unsafe to use code whose provenance is murky – you won’t know how reliable the code is. It’s also dangerous because someone might sue for infringement of copyright.’

‘Your excuse that you got an AI to write it?’ Terence asks. ‘I don’t know if that’d stand up in court…’

Secondly, Terence says: ‘Like many tools, [AIs] will be useful to programmers and they will help get more people into the art and science of programming.’

But, he worries this might come at a price: ‘Think about Photoshop, or any of the online painting programs that you can get… Just because you have this tool, it doesn't mean you're the next Leonardo da Vinci or Tracey Emin. You still need some talent. You still need to understand what you're doing and why you're doing it. Having great tools makes it easier, but they it doesn’t get you to the point of being able to do brilliant things.’

Maybe that’s a better definition of the open source spirit: Doing brilliant things. Just don’t forget the attribution.