What do you envisage the new Institute achieving?
The key feature is to bring into one place, under one banner, the study of the web as a complete system. As we change the web and as we evolve new standards and new technologies, that have, as we’ve seen in the web history today, unintended consequences.
The Institute has to be in a position to develop a way in which to anticipate what might happen but also understand what has happened and not be completely taken by surprise by these. Why was it, for example, that the blogosphere emerged when blogs were absolutely the dominant idea, or what was it about the emergence of Twitter that ensured its success, or Facebook, or any of these large social media sites?
The web has these extraordinary characteristics of very simple processes linked to very complicated large scale structures. This will also focus on the next generation of technologies that linked to some particular link data and semantic technology so the stuff Tim and I have been working on for the last ten years or so around new standards for getting below the document level with the link on the web and interface that is held in the actual documents or inside spreadsheets or inside databases themselves, having direct interlinks with that new aspect look like what other consequences for example, the government.
So heading up the stage two of the data gov.uk effort and that’s putting more non-personal data online and we’re moving to those standards representing this data. People can then be able to integrate and mash-up data in quite new ways so we think there will be a whole nucleus of services after that and would like to be capable with the Web Institute of understanding the impact both socially and economically of that opportunity.
What about the inherent security risk of having all this interlinking data?
The work to date on data.gov.uk has absolutely been about non-personal public data. So schools, transport, trade boards, environment data, crime data - there’s a huge amount to be derived from that - but it’s certainly the case that as we develop this new technology some people will want to use it for identity management or interaction.
If we were to look back from the age of the web we would’ve been amazed at the amount of information aggregation that’s now possible and we’ve seen it as a real threat on privacy and I think as each technology allows easier access it will present further challenges. Do we change it, do we maintain privacy, do we hang on to it, what do we do?
My view is that it shouldn't let privacy go as an important construct. Nothing exists that we can’t encrypt or be secure or hide behind firewalls, which we want to maintain securely, but the trouble is you tend to lose opportunities and are required to think hard about how we’ll maintain and enhance privacy. Society at large must be happy sharing the data and this conversation could be quite difficult if you want to use that for the greater good and perhaps for their benefit like to share medical data.
The press release says the Institute will help place the institute at the cutting edge of research on the semantic web. Is there an implied message there that the UK is lagging behind the rest of the world in these areas?
No, the view is precisely that, at this point we are rather well positioned, we have both the academic pedigree and some of the initial appetite both in companies and also in government to take these things and adopt them further. So I think the view is a rather more to expend and enhance what is I think a world class effort in this area. In fact the PM's view in doing this was to produce excellence not to catch up.
Obviously web 2.0 is a term that has been done to death and obviously now the talk about web 3.0 and so on. How would you personally define web 3.0?
These integers sometimes mean something, I mean I think as far as meant that web 2.0 meant a more interactive write as well as a read web that’s meaningful. I think web 3.0 has meant a lot of different things to different people. For some people it meant the web becoming intelligent; AI on the web.
I think what I actually mean by web 3.0 it’s almost what we might think of as a kind of AI, augmented intelligence where you’re putting people into contact with one at a very large scale so you get products like Wikipedia or large scale collective problem-solving. I think link data is part of that emerging landscape whether it’s web 2.5 really, because it’s the base condition, it’s the initial condition, it’s a foundation case you need to do a lot of the other smart stuff. It’s smart in itself.
The web worked the first time round, other people put the pages up put them up, they’ve got link to and very often you were entirely surprised that a particular piece of content got reused or adopted or accessed in a particular way so the authors of the content can find themselves in a space where their creation is having all these unexpected uses.
There are conventional uses, which is already in use at direct.gov.uk but if you put the data round there you as the government aren’t then best placed or you as a company or you as a university or you as a learned society or professional body are not in the best place to know how other people might use in ways you never imagined. Whether its data about schools or data about social behaviour banning orders or for your particular example is, that making the data available people will make people will find ingenious and creative ways of linking it and repurposing it.
So my view of web 3.0 is that as we are able to do more and more smart things together with machines on the web, so almost as if we were seeing lots and lots of augmented new capabilities, extended capabilities, that one of the key requirements is to have high quality data and content available a kind of a knowledge base that is going to drive these applications.
Do you foresee the Institute for Web Science working with existing bodies like BCS, and if so how do you think they could go about it?
We’re looking for a critical mass where other companies, research institutions and the government itself seeks to collaborate and contribute. So I think it’s absolutely the right way to think about this. I think actually the characteristics that you have in something like BCS is an awful lot about trying to open up information and data and interlink it in ways that provide new kind of services to membership and traditionally our content management systems and our document base systems tend to make that access difficult, non-universal.
Do you see the term semantic web as becoming common knowledge or do you see it as just being something that underpins the net as a whole?
Yes it’s an interesting thing but of course when did these terms which some people find, what’s it or can’t see it yet. When did they become commonplace. If you tried to explain to people before the web existed what the impact of linking pages together with hyperlinks they would have just looked at you and said that's just hyper text what’s that going to do? You do it at a global scale you get these new capabilities.
The semantic web when it was described, Tim’s original proposal for the web, had a lot of the ideas of the semantic web in it about linking information at a very fine grained level together and that is what we call the web of link data. Web of linked data, finer grained, more atomic.
I put in a web address in this form of the web and I don’t get a full page back I get a specific of the current membership fee for the BCS that would have a URI web address, and you only have to change one thing and that gets updated it’s updated everywhere because that’s the canonical reference point for it.
That capability will get imbedded through time, it will become common place and I suspect people won’t notice its there it will become easier and remove information friction that exists at the moment in the system. In many cases you get a more precise response to the questions and the queries you formulate. It will make all sorts of information integration easier.
The semantic web that people have described in the wild, in some of the more fanciful literature, is all about very, very clever software agents who can anticipate issues and do clever stuff with the material out there and represent it to you.
We still have examples where software agents can help, buy and sell, recommend, do all sorts of things for you on the web, but the key idea for me at this moment is whether they can build the infrastructure of linked information content and then we’ll see a whole new range of services whether it’s called the semantic web or web 2.5 or web 3.0 it’s just this constant drudge of evolution and the reason we want an Institute of web science is the web is not a static object it’s evolving sometimes and changing and leading phenomena that change faster than our ability to even take notice.
How do you see the adoption of smartphones and mobile internet driving a change in the internet?
Absolutely, mobile web, the internet of course is a protocol that support the web but what’s interesting is that mobile web now is a set of standards that has widely available on the new generation of 2G phones and smartphones that are around so an awful lot of people are going to experience this information fabric through mobile devices and you can see that what has happened quite clearly is that from a few years ago where information phone providers wanted to develop their own proprietary standards to integrate information, they have been driven to open standards and that is where the power is. The web is an open standard, the links data web will be an open standard and these devices will piggyback off that.
Obviously there’s been a lot of talk lately about broadband and what have you in the UK. Do you see the current broadband landscape in the UK is up to the ideal of the Web Institute?
The business of digital divide or digital exclusion is something that different people have different views about, I mean I think the notion of having a powerful provision is adoption and how you put it in place. That’s not in a sense solving that problem, it’s actually something I think either governments or large parts of the private sector take in hand, nothing that the Institute of Web Science can fix.
I think we can point to things and there are interesting amounts that will work try to estimate what the cost of exclusion are. What Martha Lane Fox is doing with the government is a good example of where they are trying to get their head around that so I'd rather leave that one and see how it goes. I think we should be bold about the standards, we should be bold about the standards of adopting this new technology should be bold about the infrastructure as well.
Obviously you have done a lot of work on the government website and a lot of this is about linking data together from the government holds. How do you see or what do you think should be for the semantic web for the public sector itself?
I think pretty significant because one of the key characteristics that you see in the public sector is that there are huge opportunities for removing barriers to information for making it easier to access.
Many departments have a duty and requirement to publish their content, often in very different forms for very different consumers, and I think one of the things that will be interesting is to imagine a world in which they produce the content in these new standards and it’s for the consumer of the content to take that very accessible open standard of data and present it as they wish, so it could take a huge burden off just the publication side.
It should also help in the way in which data and information can be interlinked between departments. It is pretty clear that one department of state can be in desperate need of the information held by another to help the economy to work better, whether it’s schools and transport or its crime and neighbour deprivation, they would like to know, so I think at a time when there will be an awful lot of concentration on bringing efficiencies out of the system that’s one that people need to pay attention to.
I think the other aspects of this, they were quite, if you look at the experience of data.gov.uk this entire site with it’s more than 3,000 data sets at this point was built using open source software and procured using very agile project management techniques around about five person years of work. Not to diminish it I think the point of the fact monolithic, over specified, centralised IT systems in government have not served us well in the past.
Do you think the new Web Institute could make the gov.uk site even better?
Our engagement with the government was on trying to open up data providing a point of access and the reason the Prime Minister was keen to support this Institute was to better exploit and understand how to gain benefits from exactly that sort of effort, amongst others.
What’s going to be the first step?
Well actually at the moment it’s to get our heads round the fact we’ve been running pretty full tilt since we were appointed government information advisors, so we now have to work out structure and, if you like, a bit more of the agenda that we’re going to put together and how and who we want to take this forward. It’s work in progress at this point.
Mini-case studies for early semantic web projects
Mapping Clusters of UK Technology Excellence
With the support of Talis a UK company that develops Semantic Web applications the Research Councils UK, the Technology Strategy Board and the Intellectual Property Office collaborated to develop linked datasets in four key technology areas: regenerative medicine, plastic electronics, RFID and advanced composite materials. These in turn were linked to Google maps.
By ensuring that datasets have common elements and vocabulary it enables any company or potential inward investor to identify where the clusters of expertise lie in these important emerging technologies, the companies/organisations involved, the projects they are involved in, and how much public money has gone into them. It also enables UK Science Parks to market their sites on the basis of the strength of the clusters on and around the science park in question.
The organisations have learned from this exercise and are rolling this out across all technologies. It can also be extended to include Measurement research and research programmes funded by Government Departments and support given to relevant firms by RDAs. The upshot will be a comprehensive picture of research and technology excellence in the UK, inputs and outputs, as well as evolving relationships which will be updated on a regular basis.
There has already been pilot work between the Universities of Southampton and Oxford (Prof Shadbolt and Prof Sir Michael Brady) in the area of multi-disciplinary cancer treatment. The semantic mark up language SNOMED is also used within the NHS and could provide an opportunity for much more extensive patient record linkage using linked data technologies.
Inter-Departmental Data Sharing (Smarter Government)
Semantic approaches could also improve the efficiency of handling cases which straddle two Departments e.g. Health and Social Security. There is a large scale pilot already underway in the area of Assisted Living (in Cornwall, Kent and the Borough of Newham), which is being supported by the Technology Strategy Board (the Assisted Living Innovation Platform) which would provide a significant platform for developing the use of semantic approaches in a significant and growing area of public service delivery.
Department for Business, Innovation & Skills
The Department for Business, Innovation and Skills (BIS) is building a dynamic and competitive UK economy by: creating the conditions for business success; promoting innovation, enterprise and science; and giving everyone the skills and opportunities to succeed. To achieve this it will foster world-class universities and promote an open global economy. BIS - Investing in our future.