Web filtering - are you sure you know what you are paying for?

Investment in a web filter doesn't necessarily mean organisations and their users are fully protected and getting value for money. Eamonn Doyle outlines the reality of the web filtering marketplace.

The internet is a daily tool for millions of people across the world but it is difficult to quantify its actual size due to its sheer magnitude and extremely fast growth. It is gargantuan and a popular method to get a basic estimate is to look at the number of registered domains, of which there are currently around 129 million¹.

Employers have historically been most concerned about ensuring their organisations block staff from deliberately or inadvertently reaching offensive sites such as pornography, violence and racism. The bad publicity and legal implications surrounding the organisation have always been sufficient motivation for employers to reach for their chequebooks in order to protect themselves and their staff. Today, however, the added issue of employee productivity and how this impacts on organisation efficiency has become a big issue. The continuous expansion of the internet allied to changing trends in how it is used, such as Web 2.0, means that traditional mechanisms for web filtering are simply not enough.

It is, therefore, entirely prudent to examine how web filtering suppliers can meet these growing and changing demands - after all, it is what customers are paying for, isn't it?

There are a number of key suppliers in the web filtering marketplace, most of whom rely on a URL database to ensure inappropriate web content doesn't reach users. So what's wrong with that?

The first alarm bell to ring for customers should be the recognition by major web filtering firms that their technology simply can't keep up with the growth of the internet. According to the Websense Annual Report (2006), 'Our databases and database technologies may not be able to keep pace with the growth in the number of URLs.' This, therefore, points to a fundamental issue in this type of approach to web filtering.

So what is the 'technology' they use? The usual means of web filtering is to build a URL database using human reviewers. URLs are 'harvested' through various means and then reviewed. The issue however, is one of volume. Whilst major vendors employ many personnel to review URLs for inclusion in lists, the fact is that vendors have yet to learn the lessons that car manufacturers, for example, learned decades ago; namely that 'handmade' does not offer the consistent quality and volume that user demand dictates.

If an individual reviewer can properly review and categorise say 400-800 websites per day, then a team of say 80 staff should be able to grow their URL database by 32,000-64,000 sites per day. Sounds impressive, but with millions of new sites being created each day, web filtering, which relies on a URL database, is clearly inadequate.

Compound that issue with the need to review the established database to ensure the URLs are still active and it becomes clear that some suppliers are using the equivalent of a water pistol to put out a fire.

With competition fierce in this industry, it could be argued this situation is aggravated by software authors not working too hard to clean up their current database for fear of slipping places in the 'database size wars'. A look across various vendor websites shows all sorts of database sizes quoted. The inference is clear - they argue that whoever has the biggest database must be the best at web filtering.

The reality is that in comparison to the size of the web itself, the databases on offer in the marketplace are tiny - leaving most of the web uncategorised. Put simply, organisations are paying for web filtering that can cover a tiny percentage of the internet and their employees therefore, have open access to any type of website missed by the URL database - an unacceptable situation and hardly value for money.

Whilst there is a place for traditional approaches to web filtering, it is clear that if you are going to manage a very dynamic data source, which in addition is subject to constantly changing patterns of usage and trends, then you require truly dynamic web filtering.

A number of firms are claiming some form of 'dynamic' web filtering on top of a database. Many rely on 'keywords' whereby if a site contains unacceptable keywords, the site is blocked. This technology is a step forward from databases in that at least offensive sites have the potential to be blocked. This is due to the nature of the language/slang words used in sites such as pornographic or violent websites.

However, this technology falls down in two significant ways. Firstly, websites that don't use unique language don't get blocked; for an organisation needing to control staff surfing the internet in core working hours that means general web surfing to inoffensive sites (e.g. shopping, travel, sport, news) will not be stopped by keyword scanning.

Secondly, keyword scanning is hardly 'dynamic' when the measurements it uses are largely pre-set and the ongoing need to maintain the list of keywords for effective filtering is complex and time-consuming.

The only way forward is true dynamic scanning using technology to review and categorise a URL at the point of request. It shouldn't matter if the website is new, amended or has become popular all of a sudden. Organisations need their filtering policies to be fully enforced or they simply are not getting the value from their investment in their web filtering solution.

Everyone agrees that you need to have acceptable use policies and web filtering in place. Effective web filtering is possible, but a URL database approach is just not enough to protect your organisation or staff, whatever vendor marketing hype says. Customers need to become more aware of what they are actually paying for and what limitations to effective web filtering remain despite their investment. Ignoring these limitations could cost your organisation dearly in open surfing for all staff, perhaps for years to come.

Eamonn Doyle is managing director of Bloxx.