Ransomware recovery

Bridget Kenyon, Head of Information Security at UCL, and James McCafferty, Director of IT Service Delivery at UCL, share some remediation techniques / processes and best practice for dealing with a ransomware attack so the bad guys don’t catch you out again.

Just a disclaimer up front: the advice and experiences described here are true to life, but should be adapted to your organisational context. We hope that you will find something here which speaks to you and helps you survive a ransomware incident, should you be so unlucky as to experience one.

Let’s start with the punchline - impact. From personal experience, the most expensive part of a ransomware attack is the time spent in recovery.

Why does recovery take so long?

For current purposes, let’s assume that ransomware operates as follows: the user clicks on a link, or opens an email attachment, which takes them to a website. The site auto-downloads malware to the user’s PC, which identifies file shares and locally stored data. It then runs for a half hour spree of encrypting files, then exports the encryption key to the remote ‘command and control’ server. It finally pops up an onscreen notification, telling the user what it has done and demanding a bitcoin to decrypt the files.

The actual damage to the data is typically not that severe: data can easily be recovered from backups, and you may lose a maximum of about a day of data (if you back up daily). But the restoration of files is only a small component of the recovery process. The environment which is affected has first to be isolated to prevent further damage.

Then the source of the infection must be identified, usually by looking at log files and ownership of files such as ‘howdecrypt’, which are created by the malware running as the infected user. The possible locations to which the malware may have copied itself all have to be identified and checked for damage and infection. And then everything has to be cleaned up and reassembled.

Even that isn’t the end of it. After the ransomware has done its work of encrypting systems and has shut down, quite often it is up to a week before an individual realises that there is something badly wrong with their files. During that time, people will have continued to save files to the affected storage location, which may well be a network share.

The whole of the share is suspect, so all of it has to be restored from back-up: but the files which were created after the damage was done then have to be ‘unpicked’ from the suspect share, cleaned, and added to the replacement share.

In recovery situations, it is too easy to succumb to the ‘get it back online quickly’ mindset, especially under pressure from the wider user community. Avoid trying to speed things by undermining the recovery, e.g. bringing file systems back online before the scanning is complete. Allocation of sufficient resources to the recovery process is key, as is oversight of scanning activities to ensure that all hiding places for the malware are checked; including user profiles, shares to which they may have write access, local PCs and email stores. Having the facility to do unattended scanning is also a big help.

How can you reduce the impact of a ransomware incident?

Before the incident

Assume that you will have a ransomware incident
Get agreement that you can override day-to-day work for a sufficiently severe incident, and define ‘sufficiently severe’
Agree the parameters of your response: would there ever be any circumstances in which you would choose to pay a ransom? Or will you always recover locally, or accept the loss?
Capture what each team has to do in a template process which you circulate to them before an incident for review and feedback, and a supporting per-incident checklist, with a column for recording who did each step and when
Work with the teams who will be responsible for remediation so they know the context and the expectation. A little prep can ensure that people will react really quickly and will go the extra mile to get the right outcome.

During an incident

Own and actively manage the ransomware response checklist around the relevant teams
Ensure that you have an identified contact in each team responsible for reporting, so that, for example, you do not have to hassle the person actually doing the virus scanning to find out if it’s going OK
Check, check and check again. Do not assume that people understand what you need them to do, or are planning to do it
Coordinate regular updates from the various teams involved to keep senior management and users informed
Build in extra time at day end to ensure that you have space for any slippage of activities - for example, set end-of-day deadlines to be 3:30pm, not 4:45pm
There may be a certain amount of defensiveness amongst users who may have been gulled into clicking on the wrong thing, and so forth. You should explicitly state that the priority is to ensure that the incident is being controlled, and will be resolved quickly, and to do that you need really frank and honest information from them.

After the incident

Preserve your completed ransomware response checklist after the incident and use it for lessons learned activities
Do not use what you learned as ammunition to attack people. Unless it’s disciplinary level misconduct, and then that’s an HR matter. If it gets really nasty after an incident, guess what will happen during the next one? Yep, the users and admins will do their best to hide everything and will be much less likely to help you
Agree clearly defined and justified actions, with dates and accountable parties, track whether these actions are happening, and report to management.

What about prevention?

Reduce the size of your shared environments, and minimise the number of people who can write to them. Least privilege is your mantra here
Try RPZ (Response Policy Zones for Domain Name Services). It worked wonders for us
Get a really good spam filter
Scan all incoming files for malware. NB: sandboxing may work, but remember that some malware becomes inactive in a sandbox
Create some ‘honeyfiles’ in key areas which should never change, monitor them, and raise an alert if they do
Patch and patch again: OS, applications and network gear
Isolate things which are fragile or hold sensitive information
Train users to recognise phishing and spam emails (e.g. with test phishing exercises). Reward them for reporting correctly
Encourage IT staff to look for weird stuff. Automated tools are handy, but an expert sysadmin should also be able to notice odd little things which are signs of deeper issues. They should also be encouraged to report issues and rewarded for identifying little incremental improvements
Strongly encourage people to report suspected incidents. Even if they are at fault
Keep management in the loop, e.g. by reporting to them on financial losses from ransomware recovery. Use this to justify appropriate security measures: remember, these are THEIR security measures. They manage your organisation. They should be informed about their risk
Educate people. There is no such thing as perfect security. There is no ransomware-immune person or system. But there is acceptably low risk. That’s what management aims for in other risk management contexts, so they should be able to apply that logic here.

The bottom line

In summary, if you are pragmatic and have an enlightened but cynical approach to ransomware, you can manage your risk, survive infection, and even use it to improve security.