Mix and mash

A DJ spinning a record The musical term Mash-up refers to the unauthorised combination of vocals from one song with the backing music of another - usually from a completely different genre. But now the term has taken on a whole new meaning which could revolutionise the way we use the internet.

Web users are improving their experience of the net by combining services and content from several sites to increase convenience.

Most mash-ups happen without the prior knowledge of the existing website owners. For example, a programmer took textual data from Yahoo! Traffic and map data from Google to create a website that provides traffic maps.

Numerous mash-ups have been produced by software engineers and programmers, but these mash-ups take people with the correct skills and knowledge and time and energy to create.

Composite web services

As part of its role in a European collaborative project, BT has been working on providing software tools which could see this process become automated within the next three to five years.

The collaboration is in line with BT's Open Innovation strategy which encourages innovation through open learning and sharing of knowledge with partners. The company was one of 17 participants in the recently completed DIP semantic web services project.

Dr John Davies, next generation web research manager at BT Research, explains: 'Imagine you're booking a holiday on the internet. You search a number of sites to book the trip you want at a good price, then you go to a site that sells books to buy a travel guide, and finally you search several sites to book the best priced taxi to the airport'.

'Each of these sites is a web service - it's a programme run over the net that finds information for you. But with composite web services one programme could search the internet for you until it matches the requirements you've entered. It would search lots of lower level web services on your behalf.'

For this to become possible, web-based information needs to be described in a language that computers can process - semantic web language.

Machine-processable web language

Says Davies: 'Semantic web technology is all about describing web-based information in a way that's machine-processable. Today's language is html which allows us to describe to a human how a web page should look - it's basic formatting to make pages look more attractive to the user.

'Semantic web technology will still do that, but will also allow the same information to be processed by a computer.'

The semantic web is an idea of worldwide web inventor Sir Tim Berners-Lee, who designed the web with the goal that it would be an information space useful not only for human to human communication, but also so that machines would be able to participate and help.

His vision is of a world where instead of people laboriously trawling through information on the web and negotiating with each other directly to carry out routine tasks such as scheduling appointments, finding documents and locating services, the web itself can do the hard work for them.

Context

Although search engines index much of the web's content, they have little ability to select the information a user really wants or needs. A semantic web would feature the use of context-understanding programmes that can selectively find what users want.

Says Davies: 'The semantic web can also yield improvements in knowledge management and the delivery of information. At the moment you enter a few words into a search engine and are presented with a list of thousands of documents. What the user wants is information - not a list of documents.'

Digest

The search software is capable of automatically analysing and tagging documents. It extracts names - such as BT and Blair - and can identify one as referring to a large company providing communications solutions and the other as the UK Prime Minister.

Rather than viewing a document as a bag of words - as with today's search engines - the search engine adds structure to unstructured data. This makes it easier and quicker for people to find critical information. For business people this could increase efficiency and allow them to make more informed decisions.

When a search is carried out and certain people, organisations or other entities occur frequently in the returned documents, the user will be given a summary of information relating to those entities.

The search engine can do this because it incorporates a large knowledge base, an 'ontology' in the terminology of the semantic web. The knowledge base comprises more than 200,000 entities which are gathered semi-automatically from a range of high quality public data sources.

The information in the knowledge base includes around 36,000 locations, 140,000 companies and other organisations, the world's pre-eminent politicians, businesspeople, technologists and so on.

Furthermore, the search engine will analyse documents related to a given query and return a digest of information comprising key points from the documents rather than a long list of documents for the user to click on, read and extract information themselves. Finally, a list of returned documents might then be clustered into different topic areas.

Davies again: 'Imagine you search a topic and BT comes up in a lot of the documents. The software notices this and can recognise this and highlight it as a key player in the topic you searched on.

'It could then use information stored in the knowledge base to provide a summary of BT, including data such as its headquarters, the previous year's turnover, its chief executive, how many employees it has and the current share price. This would all be done automatically without the user having to request this or make any effort to search for this extra information.'

By giving computers the means to analyse language, semantic knowledge technologies enable the dissemination of information anywhere, at any time in the form most appropriate for the device the user has.

For example, a user on the move could receive a 200 character summary of a crucial document via a text message while another user with a laptop could be sent a multimedia document. The software automatically adjusts the information sent to suit the user.

BT is also investigating the application of semantic technology to the integration of heterogeneous information sources and to specialist health case applications.

Further reading

www.sekt-project.com
www.semanticweb.org

May 2007