Isn't it semantic?

With the major www2006 conference just around the corner, BCS managing editor Brian Runciman interviews the inventor of the Web Sir Tim Berners-Lee.

This interview also appears in the ebook Leaders in Computing.

Looking back on 15 years or so of development of the Web is there anything you would do differently given the chance?
I would have skipped on the double slash - there's no need for it. Also I would have put the domain name in the reverse order - in order of size so, for example, the BCS address would read: http:uk/org/bcs/members. This would mean the BCS could have one server for the whole site or have one specific to members and the URL wouldn't have to be different.

What subsequent Web developments by others have impressed you most?
The Google algorithm was a significant development. I don't want to name too many, but in general I like the fact that I've had thank you emails from people whose lives have been saved by information on a medical website or who have found the love of their life on a dating website, which is great. The important thing is the diversity available.

If there was one 'quick win' that could improve any aspect of the Web right now (other than the semantic approach, more of which later) what would it be?
Something that looks easy - browser security. Most browsers have certificates set up and secure connections, but the browser view only shows a padlock - it doesn't tell you who owns the certificate. Just having the browser tell you this makes perfect sense - it's a little thing but 'duh'. We have a working group now at W3C looking at that kind of thing because it's got a lot to do with user interactions with systems.

What are the biggest issues the Web needs to face now and in the near future?
There are three main areas. Firstly, security - phishing is mostly done via email, but it involves HTML so W3C will address this issue. Secondly, the Mobile Web Initiative is important - information must be made seamlessly available on any device. And finally, Web services, where re-engineering and enterprise software are key issues.

How does the ICANN ownership issue affect the Web?
The Domain Name Server (DNS) is the Achilles heel of the Web. The important thing is that it's managed responsibly. I don't think we've gained anything from the .biz or .info domains - only that a few companies have benefited financially. The challenge at the moment is to manage this in an open way - not too much bureaucracy, not subject to political or commercial pressures. The US should demonstrate that it is prepared to share control with the world in general. It must be politically clean and efficient, but there's no simple solution.

I saw your interview with Mark Lawson this year - he seemed to be trying to lay the blame for the negative aspects of Web content at your door - how do you feel about the media's approach to the Web?
That was atypical. People actually emailed to apologise on his behalf, but I think he was just trying to make the interview more interesting. In any case, we shouldn’t build a technology to colour, or grey out, what people say. The media in general is balanced, although there are a lot of issues to be addressed that the media rightly pick up on. For example intellectual property is an important legal and cultural issue. Society as a whole has complex issues to face here: private ownership versus open source and so on.

Will software patents have a negative effect on the development of the Web?
There are major issues here. Eolas, a spin-off from the University of California, claimed ownership of a unique combination of embedding objects into a Web page so that it happens automatically through use of compiled code. This is an example of a random patent of a combination of existing technologies.

The claim was based on that fact that it's done by compiled code, but any good software engineer will tell you that a compiler and an interpreter are interchangeable. Programming is always about reassembling existing stuff - novel ideas are rare. Even with the development of the Web hypertext was already there and so on.

A bright idea is OK, but getting people to adopt common standards is impeded by patents. At W3C we have a working group that had to stop work on a project for 18 months (a lot in Web years!) to answer a patent issue. This affected the livelihood of people in companies that were doing really good work.

Often, the cases we've seen involving patents on Web technologies have been spurious at best. There are many cases in which a patent's novelty is extremely unclear, but the legal costs of discussing it would be prohibitive.

Patents are often used by large companies who can afford the legal fees, or some one-man-bands who have nothing to lose and hoping for a pay-off from a larger company. They are often defensive against other patents. Because of this W3C now has a patent policy.

To be fair most larger companies have now had a serious change of understanding and see that for the market to grow Web infrastructure must be royalty free.

BCS is pursuing professionalism in IT - what are your thoughts on this?
When it comes to professionalism as a general topic there are important things to talk about and it makes sense to talk about being professional in IT. This may be a biased point of view, but standards are vital so that IT professionals can provide systems that last.

Customers need to be given control of their own data - not being tied into a certain manufacturer so that when there are problems they are always obliged to go back to them. IT professionals have a responsibility to understand the use of standards and the importance of making Web applications that work with any kind of device. They need to take the view that data is a precious thing and will last longer than the systems themselves.

Another important area of professionalism is accessibility awareness. Everyone should be accommodated, especially when around 20 per cent of the population have special requirements. In fact, Microsoft said recently that nearly 50 per cent of people need to make some sort of adjustment to their system to interact with it. Having turned 50, I'm very aware of receiving email with very small fonts - people don't want to use their spectacles to look at a Web page!

Project failure is a big subject in the UK and you've been involved in a massive ongoing IT project - what have you learned from it that could benefit our members?
This is a huge area - an answer would fill several books. But I think IT projects are about supporting social systems - about communications between people and machines. They tend to fail due to cultural issues. For example, moving control of data from someone with 20 years experience of working with it to someone else can lead to problems, as a company you lose with this approach. The original idea of the Web was about supporting the way people already work socially, but this doesn't happen with a lot of IT projects.

The view we are taking with the Semantic Web is interesting here. In the past scientists have been trained to do things top down. In the business world projects are often the boss's vision made flesh. Even software engineering is about taking an idea and breaking it into smaller pieces to work on - but the software project is itself part of something larger. To make this better we need Web-like approaches - I'm not talking about HTML here but, rather, an interconnected approach.

The Semantic Web approach can be visualised as rigid platelets of information loosely sewn together at the edges - rich in local knowledge, but capable of linking to things in the outside world. That approach would benefit the social aspects of projects.

Ian Horrocks spoke to the BCS on ontologies, the application of which would clearly see a true Semantic Web, but how can we apply these principles to the billions of existing Web pages?
Don't. Web pages are designed for people. For the Semantic Web we need to look at existing databases and the data in them. To make this information useful semantically requires a sequence of events:

Do a model of what's in the database - which would give you an ontology you could work out on the back of an envelope. Write it in RDF Schema or OWL (the Web Ontology Language).
Find out who else has already got equivalent terms in an ontology. For those things use their terms instead.
Write down how your database connects to those things.

Using this information you can set up a Web server that runs resource description framework (RDF). A larger database could support queries.

To make all this really useful it's important that all important things - such as customers and products - have URIs (Uniform Resource Identifiers) - for example, http:// example.com/products.rdf#hairdryers - so invoices, shipping notes, product specifications and so on can refer to them.

These would all be virtual RDF files - the server would generate them on the fly and it would all be available on the Semantic Web. Then an individual could compare products directly by their specifications, weight and delivery charges, price and so on, in a way that HTML won't allow.

In your book you mention the aim of making the Web operate more like the human brain in making unusual and richer connections between data - doesn't the Web perform that function better in a way now because of the tangential returns you get from searches? Wouldn't the ontological approach make the Web less like the human brain?
Well, the Semantic Web is about data. The Web of human ideas is served by the hypertext Web but the Semantic Web helps with machine analysis. Take the current concerns over bird flu. Is it only around agricultural areas? Suppose we have shared terms (URIs) for latitude and longitude and time.

That would allow so many forms of public and private data to be correlated. We could also combine any say, medical data, with socio-economic data from the World Bank - land use and so on. This could co-relate bird flu information and export it to a spreadsheet and lead to serious analysis. So, where HTML provides information in a way easy for humans to read, the semantic Web will enable much better analysis of it.

Brian Eno gave a speech a few years ago on generative music - he said that he likes the economy of it, that just from a few simple rules complex and fascinating things can arise. How good an analogy for the rise of the Web from a relatively simple approach - hypertext and links - is this?
This is where Web engineering, physics, Web science and philosophical engineering meet. Physics was actually called experimental philosophy at Oxford. The Web is now philosophical engineering. Physics and the Web are both about the relationship between the small and the large.

In physics, to take the behaviour of gases as an example, you visualize them as billiard balls, model the rules they follow and then transpose that to a larger scale to account for the effects of temperature and pressure - so physicists analyze systems. Web scientists, however, can create the systems.

So we could say we want the Web to reflect a vision of the world where everything is done democratically, where we have an informed electorate and accountable officials. To do that we get computers to talk with each other in such a way as to promote that ideal.

GUIs a are big influence on an individual's Web experiences - what improvements do you think we need there because, with the best will in the world, a beginner still cannot go online and work a browser/editor intuitively?
There is always work to do on interfaces. I've been playing with Ajax technologies to explore that space because it can be a lot better. I like to use a really big screen - lots of pixels. Even when I do things on paper I like to use an A0 sized piece of paper and felt tipped pens (Sir Tim shows me notes from a recent meeting - huge piece of paper, lots of colours). The internet is a technology to help you get hold of a lot of data, so you should be able to see it.

At the moment a lot of company knowledge is held on spreadsheets and Powerpoint slides, because companies need to see summaries. But the data has lost its semantics, so it's not usable. For the Web people make extensive use of the favourites menu or the history, but there's still a long way to go in collaborative data. Blogs are editable in a limited way, HTML is too complex, but at least blogs allow people to be a little creative.

Why is a Web year 2.6 months?
It's about the pace of change. It seems a very specific figure! I don't know who made up that number. It was an expression of the acceleration people felt during the early 1990s. Compared even to the development of the phone or TV, the Web developed very quickly. I think it is now coming to the end of its adolescence, maturing after a phase of testing its boundaries. Even phishing and spam have been part of its education.

In the past you'd have restrictions, like finding books saying that you needed to design Websites that fit an 800 x 600 pixel screen. Now that people understand standards and business more they know there's always another browser round the corner and the view of Web and its technology is maturing accordingly.

What do you hope WWW2006 will achieve?
In a lot of ways Web development is at a multi-way crossroads. There are a huge number of developments that are potentially world changing and there's a lot of excitement. The conference will bring everyone together and get a handle on which are important and where we should be focusing our energy in the next five years. I recently had a meeting at the BCS, with academics bearded and otherwise, to discuss the future of what we call Web science (the science and engineering of Web-based things that have a bottom up Web-like structure).

Every few weeks it seems a new phenomenon breaks across the Web - blogs, wikis, opinion networks, new forms of syndication, new genres of information and so on. Standards like CSS (Cascading Style Sheets), the document object model and so on are surfacing as things that some people call Web 2.0 - but really it's a use of existing technology.

Mash-ups are called Web 2.0, and are limited data integrations - taking a piece of display technology like a map application and doing a handcrafted data integration. I've yet to see a mash-up that takes any generic semantic Web data and maps it - the fact that everyone has their own mash-up shows the need for semantic web standards.

What I've always wanted to do is take an arbitrary thing, a data file, and if it's got something that can be mapped, drop it into a map and see what occurs without programming. We've seen the development of Web technologies but, in the demos we've seen, people are using semantic technology for a specific application - for life sciences or a geo-spatial project. But at the conference I think we'll see the emergence of data as a Web.

Going back to the idea of anything I would change, I remember Dan Connolly - who really understood SGML - asking me if HTML was SGML. I really wanted the SGML people on board so I said yes. I should have said no - and we would have developed XML a lot sooner. A seminal discussion about changing SGML to XML actually happened during a hypertext conference, in a pub in Edinburgh. This May there'll be discussions in pubs in Edinburgh again, where we can talk about what we'll need to change now - it'll be a blast.

Web timeline

(in regular years, please multiply by 2.6 for Web years)

1930s Vannevar Bush conceives the 'memex' - a machine to store his communications that could be consulted with great speed. He inspired:
1960s Douglas Englebart, whose team came up with the concept of hypertext, and Ted Nelson, who invented the word and published the concept.
1980 Tim Berners-Lee writes the Enquire programme - to help researchers at CERN update and share information
1989 Tim Berners-Lee writes the first proposal at CERN for a system that combines hypertext functionality with Internet foundations
1990 Tim Berners-Lee writes WorldWideWeb, the first Web browser and editor, along with the first versions of Hypertext Transfer Protocol (HTTP) and Hypertext Markup Language (HTML)
1991 First Website appears at http://info.cern.ch/ (now defunct)
1993 CERN announce that the Web is free for anyone to use
1994 W3C founded
2004 Tim Berners-Lee is knighted for services to the global development of the internet

Sir Tim Berners-Lee, a Distinguished Chartered member of BCS, is the director of the World Wide Web Consortium, senior researcher at the Massachusetts Institute of Technology's Computer Science and Artificial Intelligence Laboratory, and professor of computer science at Southampton ECS.