PDF spam - a step ahead of image spam

Over time, spammers have changed their spamming tactics in their bid to gain access to people's mailboxes. The latest tactic is to use the common PDF file format to send image spam. Brian Azzopardi from GFI Software reports.

By using PDF attachments to send images instead of embedding them in the body of the email message, spammers have taken the cat-and-mouse game with anti-spam software developers to a new level.

Introduction

At one point or another, we have all received emails that promise business deals worth millions of pounds, that try to sell products to improve our appearance or that try to convince us why it's worth investing our money in a particular company.
Dealing with spam (unsolicited email that is not targeted at specific individuals), is one problem that all email users share in common. Research shows that up to 75.6 per cent of email messages are spam; with the average somewhere around the 55 per cent mark. Other email services have reported even higher figures, some averaging over 90 per cent.

On an individual user basis, spam is annoying; a waste of time and often it contains spyware, malware and even pornography. However, spam also entails a cost for all those companies whose employees have to process all the spam email that they receive on a daily basis.

The evolution of spam

Up to a few months ago, spam was the domain of text- or html-based emails. For anonymous delivery, these messages traditionally relied on abusing open SMTP relays. When open SMTP relays became less common, spammers switched to proxy servers, dial-up services and more recently, hijacked computers. Spammers designed personalised template emails to deliver their messages and then made use of bulk mailing software for distribution.

To block spam, email service providers and companies often relied on keyword 'detection', and drew up a list of keywords that commonly appeared in most of the spam email. This list would often include keywords such as 'Viagra' or 'bank'. However, this method often led to block genuine email and adding more keywords, simply resulted in more false positives which in turn blocked legitimate email. But spammers became smarter too, and they addressed keyword blocking by replacing keywords such as 'Viagra' to 'v1agra'.

Another attempt at blocking spam includes making use of blacklists that contain a list of IP addresses of known spammers or compromised hosts. However, these lists have to be constantly updated because spammers have learnt to counteract this by rapidly changing the origin of spam.

New trends: dynamic zombie botnets

Botnets can be defined as networks of compromised computers which can be controlled by a single master. The number of nodes (also known as zombies) of these botnets can run into millions and these machines make use of different software vulnerabilities to gain full access to the infected hosts and add it to their existing array of zombies.

Computer hackers had long been using botnets to launch DoS (denial of service) attacks and distribute network hacking attacks. Computer criminals had also been using botnets for money-making schemes, such as stealing credit card information and scamming pay-per-click advertising companies.

Seeing huge potential in botnets, spammers started financing hackers to make use of zombie machines. Hackers were able to offer services such as the renting of botnets for a few minutes or hours and collections of email recipients (spam lists).

The anti-virus industry noticed correlations between the spam industry and botnets. Not only were malware writers allowing spammers to make use of their creations, but they were writing malicious code to specifically suit their needs. An unholy alliance had been created.

Image spam

By early 2006, most anti-spam vendors had added Bayesian filtering to their arsenal of spam blocking methods. The fight between spam and anti-spam looked like it was taking a positive turn. However, by the end of 2006, the nature of spam had totally shifted. Whereas spam had been mainly text based, this time spam started looking more graphic in nature.

Spammers began making use of images to bypass text-based content filtering, simply by not embedding any text content. By making use of image spam, spammers were attacking the defences of most anti-spam solutions; while the images displayed text messages to the end users, the anti-spam software was only able to see pixels.

Some email anti-spam solutions decided to go with optical character recognition (OCR) to turn the images into text that the software could then use. However, spammers took their images to the next level.

In an approach usually applied to CAPTCHA (an anti-spam solution that is used on web forums), they started fuzzing (including noise and distortions) images to make it even harder for the machine to recognise text. Although it is possible for the machine to read this text, the process is very CPU intensive - especially when it is handling multitudes of images every few seconds.

PDF spam: the latest trend

Although spammers registered considerable success with image spam, the anti-spam software industry had not lost the battle and quickly came out with new counter-measures to stop image spam.

Realising that filters had a problem with images, the answer was to hit spammers at source - that is where the email originated from. This new approach had an immediate positive result and considerably decreased the effectiveness of image spam and gave back to email users some control over their mailbox.

As with every cat-and-mouse game, spammers had to respond and in mid 2007, they came up with a new technique that is not only ingenious but even more problematic than image spam. Instead of embedding the image within the email itself, they 'repackaged' it within an attachment using one of the most common file formats in use today - a PDF file.

This move is clever for a number of reasons:

Email users 'expect' spam to be an image or text within the body of the email and not as an attachment.
Since most businesses today transfer documents using the PDF format, email users will have to check each PDF document otherwise they risk losing important documentation.
With most anti-spam software products on the market geared towards filtering the email itself and not attachments, spam has a longer shelf-life within a network.
An attachment that is a PDF file has greater credibility in an email thus making social engineering attacks much easier.
The ability to send large PDF files could result in a single spam attack causing huge bottlenecks on a company's email server and reduce the quality and amount of bandwidth available.
By sending PDF attachments, spammers can also resort to phishing by attaching supposedly authentic documents from a bank or service provider.

This new tactic could prove to be more effective than traditional image spam and its prevalence could be considerable until countermeasures are applied.

Conclusion

Spam continues to be a headache for administrators and end-users because spammers are constantly trying to stay one step ahead of anti-spam software vendors. Using keyword detection methods alone will not solve the problem because new spamming techniques have overcome that hurdle.

The solution lies in a product that deploys as many anti-spam techniques as possible, including Bayesian filtering and PDF filtering, yet returns the lowest level of false positives possible. Moreover, the package should be easy to install and manage without adding unnecessary administrative burdens and the solution should efficiently handle spam with minimal end-user intervention.