When Hurricane Ike struck one of MindTree's client's Houston Data Center, the benefit of having and implementing a business continuity plan (BCP) and disaster recovery (DR) programme was acutely recognised. Luckily, as a result of a sound plan, the company were able to ensure recovery from the disaster within 48 hours with minimal disruption to business.
This was possible because of several months of planning and implementation, which enabled critical business operations to be moved to a disaster recovery site within 24 hours.
- Natural: Floods, hurricane, earthquake, wild fires, epidemics, tsunami
- Human: Terrorism, war, vandalism / riots, burglary, data theft, fraud, accidents
- Technical: Virus attack, power failure, HVAC failure, network failure, building problems, hardware failure
- Proximity: Nuclear reactors, railway tracks, airports, electrical stations, military bases
The process that identifies these threats and provides a framework for building organisational resilience with capability for an effective response to safeguard the interests of business operations, key stake holders, reputation and brand image is called business continuity management (BCM).
Generally, most enterprises need to be back in business with minimum downtime. And, although there is no 'one size suits all’ generic BCM and disaster recovery plan there are some useful guidelines available. For example, The British Standards Institution (BSI) has an independent standard for BCP - BS 25999-1 where, using these guidelines, each enterprise can develop their own customised BCP.
Our recommendation is that well defined BCM has the following essential components:
The structured programme to secure an organisation’s business operations starts with a clearly articulated vision. We believe that this vision should come from the CEO and that the initiatives should be driven from the top. The vision needs to be adapted to all the departments and be incorporated as a part of corporate governance.
The next stage is to define a well articulated strategy for recovery where the essential functions and timelines are identified.
The strategy should clearly focus on the recovery of business operations, brand image and reputation and should include:
- A BCP budget, which should be formalised and approved by senior management.
- Identification of disaster declaration authorities who will be responsible for implementing the continuity strategies in the event of a disaster.
- Identifying incident management systems or a process for monitoring, recovering and stabilising from a disaster or business interruption.
- A periodic review of the plan, which should be benchmarked against industry standard practices and other similar organisations’ best practices.
Organisation wide awareness
One of the main challenges of BCM is to combat a lack of interest. BCM is often treated as an initiative of either IS or the security department yet it is important to create awareness among all the employees and partners of an organisation. A training plan should be developed and this training should be conducted at regular intervals.
Identification of information assets
Information resides everywhere in an organisation, on paper, in computers, in storage racks, in tapes stored in remote locations and even in employees' heads. All these sources of information are vulnerable to external and internal threats and the damages could be significant. These information assets need to be identified along with their location and the criticality of these assets needs to be documented.
Two important characteristics of risks are:
- probability of occurrence of risk (low, medium and high);
- severity of the risk (low, medium and high).
It is important to develop a risk table by:
- listing all the risks;
- categorising the risks;
- analysing the probability and severity;
- sorting through the risks and identifying those to be managed.
Risk analysis needs to be undertaken to cover the impact of the risk. For example, an earthquake of Ritcher scale 8.0 is of low probability in London but of high impact to your information assets. However, a virus attack can be high probability but low impact if all secure measures are taken to prevent an attack.
This impact analysis should also cover potential financial and brand damages. And, it is also important to identify any other key business processes and critical dependencies.
Once the impacts are analysed, a mitigation strategy should be developed for each category of ris
Risk mitigation involves:
- analysis of threats most likely to occur;
- identifying threats which make most impact;
- minimising service disruptions and financial loss;
- having a contingency plan for mitigating risks.
For example, the risk mitigation strategy for hardware failure of a mission critical server is to have spares on-site so that the downtime is minimised
Business continuity plan
The business continuity plan should have the optimum business recovery time for your organisation. For example, if it is acceptable for the recovery time of your business to be measured in days then you may opt for offsite tape storage. However, if the necessary recovery time is just a few hours, then a hot standby system at a disaster recovery site may be needed.
BCP should include the following aspects:
- identify process specific recovery time objectives;
- identify the minimum capacity requirement to run the business operations at an acceptable level;
- identify critical information resources;
- identify procedures for acquiring critical resources in the event of a disaster;
- identify contact information and procedures for disaster authorities;
- identify and keep ready a disaster recovery site and develop procedures for relocating there;
- define standard procedures for response, recovery and restoration;
- define emergency response procedures;
- identify ER team members with contact information;
- document and train crisis communication procedures.
DR site strategy and implementation
If the primary site of business has a major disaster, the business processes may have to be located to an alternate site. This may include people, machinery and IT assets and the DR site has to be carefully selected so that the same disaster does not affect it too.
For example, if the probability of a forest fire spreading across the entire location is very high, then the disaster site should be located several hundreds of miles away from the primary site.
It is also important to identify minimum capacity operations at the disaster recovery site so an acceptable level of business continues until the primary site becomes functional again.
Disaster recovery drills
Disaster recovery drills need to be drawn and tested at regular intervals in order to ensure your preparedness. These should cover all aspects of a business from sales to operations and from HR to IT.
Drills often only take care of certain aspects of the business, but it is worthwhile creating disaster simulation models to test the DR drills in areas where an actual drill cannot be taken care of.
The drills should involve all critical business units, departments and functions and the roles and responsibilities for BCP testing should be assigned in advance.
Audit and continuous improvement
A post test review and analysis needs to be created and the BCM procedures need to be periodically audited to ensure compliance with company standards. Specific timelines need to be defined to update the BCM based on the change management process of the organisation.
While it is commonly accepted that BCM is a necessity for nearly every enterprise, implementation often is faced with several challenges. These include the fact that BCM doesn't have any ROI, BCM does not generate revenue, BCM can be replaced by insurance and the mindset that 'a disaster will not happen to us'.
This latter one is certainly a dangerous attitude to have as we have seen the evidence of many disasters over the last few years. Our conclusion is that BCM should be part of your IT strategy and you make sure it is lean and mean and that you have at least the minimum capacity requirement in place in order to run your business operations should disaster strike.