Justin Richards talked to Hyun Chul Lee and Mark Harper, from the US, at the WWW2006 conference, about their web crawler which could help revolutionize the way we search online.
Hyun Chul Lee is a final year student, at the University of Toronto, and Mark Harper works for strategic relations at Genieknows, a 'pay-per-click' ad network, which is sponsoring Chul Lee's thesis.
Chul's thesis is primarily concerned with 'Local Search', a more specific type of web search, which Genieknows is keen to incorporate into their own systems. Both were at the prestigious conference to present their paper 'Geographically Focused Collaborative Crawling'.
BCS: What benefits could your research have for the average person logging on to the web to do a search?
Chul: Unlike many search engines, which claim to be local searches but are really more like a yellow pages type local search, this is the next step of local searching, a true local search, based on actual web pages not like the yellow page type.
With our technology we can collect web pages and provide information to the user. Before this technology the average user wasn't able to access or collect geographically sensitive web pages, but with this technology the average user can do this.
BCS: Can you define 'geographically sensitive'?
Chul: Well that's all about this type of technology. We have a very special method of semantics for the web, providing full content, and we can use this technology to provide better content using extended angle text and link structures.
We can use this technology to determine if it is geographically sensitive or not and if it is geographically sensitive we can get rid of it.
BCS: You talk about 'Collaborative Crawlers' in your paper; can you explain what you mean by that?
Chul: Collaborative Crawling is used everywhere by big companies like Yahoo, Google and Microsoft. You don't just have one crawler node but huge amounts of nodes moving around the web and collecting huge amounts of data.
So by combining crawler potential one can increase the speed of response, accuracy and volume of data collected. Our technology follows this potential, and we can collect huge amounts of data, 40 –50 M pages per day, which are geographically sensitive, so in a few months we can cover the entire web.
BCS: Would you say that this new 'format' would be a major facilitator for the semantic web 2.0?
Chul: That's right.
BCS: So how does this new system differ from what Google or Yahoo do?
Chul: For Google to do a 'Local' search they have to 'crawl' the entire web, which is very hard, then after they have collected the entire web they have to filter out those pages which are geographically sensitive.
So it takes a lot of infrastructure to do this and can compromise quality of the search because of the amount of filtering involved. With our technology we can just go to specific locations, specific parts of the web, grab the required data and ignore the rest.
BCS: How much more do you think the web can develop in terms of where it's going; when do you think the WWW will reach a point where it's reached its full potential?
Chul: A lot of people think that 'Local' search is one of the directions which Web 2.0 has to take. There are a lot of small communities within the web and standard search engines can't keep track of them all so local engines can reach these communities, which are part of the future of the web.
Communities which share information about health, education, industry or whatever need suitable engines to keep track of what is happening within their specialized fields and this is one of the main directions the web is moving in. The web will facilitate increased specialization from these communities.
BCS: What next for your research? What stage have you reached?
Chul: We want to make our search engine personalized. Not only do we want to perform searches for people but also keep track of their search history so we can refine future search results based on that knowledge.
BCS: So this will also be useful for companies to observe market changes and changing preferences?
Mark: I think it's a two way process. Users are demanding that, wanting a more personalized approach to their searches. It was interesting, in the keynote speech this morning, that the chairman of Motorola talked a lot about personalization of content, and that ties in with lots of the research that we are doing with Local Search.
And that's a realization that our R & D staff came to quite a while ago. Hence, the next stage for us will be localization on mobile devices. This will be key to the future development of the web.
BCS: Will it always be the case that search engines will have to keep developing to keep up with consumer demand?
Mark: I think consumers are very savvy to the fact that there are other search engines that they can go to for further information. Vertical search is probably the way to go after local. We are trying to establish ourselves as a niche search engine.
BCS: How much room is there in the market place for more search engines?
Mark: I think there's a window, which is shrinking. We are trying to position ourselves quickly as a technological company that can produce niche search engines.
Chul: When we have presented this paper a lot of people have been sceptical saying 'how can you beat other companies?' but we can. Years ago no one thought you could better AltaVista who had the money, the resources, everything.
Mark: But then Yahoo bought AltaVista and then all of a sudden there was the fundamental platform for the Yahoo search engine.
BCS: What do you think will be the next big thing within the IT arena?
Mark: I think in terms of local search that's an immediate thing, for a user on the street with his pocket PC or his Blackberry device looking for products and services locally.
BCS: Like a mobile Yellow Pages?
Chul: But more than Yellow Pages. I think these search engines can provide more than just a name and address. The user will want more information on the business.
Mark: They'll want to see reviews on the business; they'll want to see auxiliary information that wouldn't necessarily be in a Yellow Pages environment.
Chul: The Yellow Pages are too commercial in the sense that they are solely commercially driven listing businesses for money; but the users, when they are performing local searches, they are not necessarily interested in commercial information. For example, I might say I'm interested in the history of Edinburgh, which isn't commercial.
That sort of information might not be obtainable from Yellow Pages, as is a very broad term, but that sort of information is up on the web. The true local search has to bring back the required data so the user can then go on to discover the commercial aspects if they want to.