Appreciating the invisible

Drake Pruitt, CEO Bocada look at applying service principles to data protection.

Backup and data protection is often regarded in the same way as insurance - companies know that they need it, often investing significant sums in it, but can't see its value until they are forced to rely on it. Enterprise data protection often operates unseen amid more eye-catching IT projects and services.

Data protection doesn't grab the headlines within the business - until there's a problem. What's more, companies throw money at data protection with no visibility into its success or return on investment.

This can prove difficult for staff members with responsibility for data protection, who fulfil a vital role that is often undervalued.

Yet it is a service just like any other – it should operate and be measured against specific levels of performance, which will in turn allow the business to appreciate how the financial investment in data protection resources has been managed to meet performance objectives.

To meet this challenge, proactive IT departments are beginning to use service level agreements (SLAs) to demonstrate and prove the value of their back-up and data protection services.

In-house data protection teams are discovering that implementing SLAs is an effective way not only to prove the value of their services, but also to drive down costs by matching service cost to data value.

The trouble is that many organizations are less than adept at drawing up an SLA for internal operations. A poorly drafted SLA carries risk, leaving a business ill-informed about the true costs and outcomes of data protection and, even worse, could prioritise the wrong elements of the data protection service.

On the other hand, a well-crafted SLA can assist the IT team and the data owners as well as the entire business. The service provider or IT organization can define customer expectations and levels of service.

For contracted IT service providers, the SLA can also stipulate penalties and/or payment for inadequate service, or bonuses when goals are exceeded. But whether it is an internal agreement or a third-party contract, a data protection SLA delineates specific expectations of service.

With an SLA, the client or data owner knows what services will be delivered, and which guarantees of performance are backed by specific penalties.

The CIO, who can set specific standards and implement practices for data protection squarely focused on business objectives, can also demonstrate to business managers the capabilities and limitations of IT systems in meeting those goals.

The executive/management team gains an understanding of the true risk of lost data to the enterprise, and the cost of lowering that risk.

The key to SLAs - making data protection measurable

To create an SLA and then prove its requirements are being met, there must be some way to objectively measure and verify performance.

It is possible to quantify data protection activities manually by reviewing log files and individual backup application reports, interpreting the results and coming up with some measure of backup successes and failures.

Some IT departments go this route, but the SLAs are necessarily limited in scope and require inordinate amounts of staff time to verify.

In addition, especially for third-party contractors, there is some scepticism about the results, given that the organization responsible for meeting the SLA is also compiling results and creating reports.

An automated, objective reporting application is the key to developing and implementing SLAs. In addition to supporting the metrics required, such an application should also be:

  • Multi-vendor and vendor-neutral: Capable of generating reports in heterogeneous backup environments, accommodating the full range of backup applications, platforms and devices.
  • Focused on data protection rather than backup: Support a wide range of metrics that go beyond operational reports of individual backup successes and failures. It should report on the data protection status of individual servers, clusters of servers by department, and report on global parameters that are meaningful to data owners.
  • Third-party independent: Free of any biases, or the appearance of them, by being from an independent vendor, and requiring no intermediate data manipulation by IT staff in order to generate reports.
  • Easy to install, use and maintain: Minimally impact the computing environment; add few or no change management burdens; and automate information-gathering and dissemination of reports so as not to burden the IT staff with additional duties.

With such a reporting tool, it is possible to implement SLAs with a proven, methodical step-by-step process that allows organizations to improve their data protection services and prove their value. As such, steps in this process should include:

Step 1 - Assess the data protection environment

The first step toward developing a data protection SLA is gaining a clear view of the existing data protection environment, process and efficacy.

A reporting application with the ability to look across the heterogeneous backup environment and consolidate reports of activity into a common format provides this view and helps establish baseline metrics.

Without this view into the current status of the data protection environment, the goals you set might be unrealistic: 'Make sure 100 per cent of our data is recoverable, immediately, to any point in time, forever'. While this goal might be desirable to the data owner, it would be extremely expensive for the business.

This initial evaluation is often an eye-opener. Most operations contain unseen backup failures resulting from defective media, network collisions, server glitches and inevitable human error.

These neglected and failed backups are often chronic, and remain undetected and therefore uncorrected until a restore or an audit is requested.

Typically enterprises that have installed a reporting application have found at least one server or database completely overlooked – and often containing business-critical information. And less important resources are often found to be receiving daily full backups and a degree of protection far out of proportion to their value.

A customer database obviously warrants a higher priority for backup than, say, an internal office supply spreadsheet. A company can waste hundreds of thousands of pounds on needless backups of largely static data by putting those stores on the same schedule as the CRM system.

Step 2 - Align the degree of protection with the business goals

Once current performance levels are known, IT management can work with the data owners and business units to arrive at appropriate levels of protection, based on the business criticality and time-sensitivity of each data resource.

This includes anticipating the need for immediate recovery, the degree to which recovering aged data is acceptable and the need for data retention for archival purposes. Data which must be reproducible for compliance with government regulations calls for special scrutiny.

The key elements involved are:

  • In case of failure, how recent must the backup be? In the case of contact management in a business where leads have especially high value, it might be an hour. For other applications, daily, weekly or even quarterly might suffice.
  • How quickly must the data be restored? If faster restores are vital, more frequent full backups (versus incremental backups) might be required.
  • What are archival requirements? Some data owners may want to archive and maintain a monthly or yearly backup. Some data must be retained to protect the company in the event of a regulatory audit; some requirements call for data retention periods of up to seven years.

Step 3 - Identify the specific terms of the SLA

Given the degree of protection required by the data, regulations and business units, the IT department can arrive at a backup process and procedure which achieves it. It then turns those requirements into specific terms under which the measured services should be delivered.

The terms of the SLA should include:

Metrics: A variety of parameters can be used to measure and verify compliance. The SLA will include these measures and specify levels of acceptable performance. Ideally, the reporting application directly outputs these measures. Some useful ones for developing an SLA might include:

  • Backup frequency. The frequency of both full and incremental backups.
  • Allowable errors. The number of allowable backup errors, delineating a failed or successful backup. A reporting application should have an adjustable threshold, so that the informational or notice warnings output by some backup applications are not flagged as serious errors.
  • Operational retention. The length of time backup sets should be held in an easily accessible location, for performing restores.
  • Archival retention. The length of time legacy archives should be retained, for regulatory
    compliance, future audits or general business protection.
  • Backup window. The time during which the backup may run.
  • Job size. The total number of bytes backed up during each job.
  • Reporting requirements. Ideally, the reporting application automatically pushes regular reports and proof of performance to the data owners via email or posting to a web server.
  • Event response. If a backup fails or a restore is needed, the response requirement is normally instantaneous. In other cases, a 24-hour response window to remedy the failure or perform the restore may be adequate.
  • Penalties. An SLA without penalties for an outsource services provider is toothless - it isn't really an SLA because there is no guarantee or incentive for success nor punishment for failure. A common SLA penalty is a month of service provider's fees waived for a failed restore, but when such a penalty is invoked, the data is already lost. A good recommendation is a combination of incentives and penalties. For example, if all backups are completed, verified and retrievable in a given month, the service provider receives a bonus. For every failed backup that is irretrievable, there is a penalty. And, in the case of more than a pre-set number of failed backups in a set time period, penalties escalate.
  • Costs. Of course, pricing is an essential element of any contract with an outside provider. But even if an SLA is between a company and its internal IT department, costs can be made a part of the agreement.

Step Four: Remedy service shortfalls

The reporting application should clearly flag data resources that are at risk through unmet goals and borderline SLA performance, and that call for the highest-priority response.

Ideally, it goes a step further by revealing the root causes of shortfalls, whether equipment failures, tape failures, faulty network cables, file-open errors or simple human errors such as failing to load the tape library.

Without this reporting, backup administrators spend much of their time focusing on isolated failures and sifting through log files, looking for clues to the causes. By contrast, the SLA, since it is driven by business value, brings the IT staff's priority and focus to the failures that are truly business critical.

Step Five: Prove performance

An SLA, especially when combined with a reporting application that does cost accounting, can do much to elevate the perceived value of data protection efforts.

Rather than providing a service to which the organization is essentially blind, the IT department is able to prove they are protecting critical business resources by presenting empirical evidence in a readily accessible, frequently updated format.

The result is a strategic business intelligence approach that delivers an entirely different view of data protection. It becomes possible to graphically illustrate which areas are the most crucial, which are being neglected, and which are over-consuming resources.

The company gains visibility into its use of storage media, hardware resources such as networks as backup servers, and the personnel that monitor and maintain it. It can then allocate those resources based on the value of the data.

This allows backup administrators to move out of a reactive mode and into proactive management.

Time previously spent gathering data on data protection activities, compiling reports manually and doing laborious detective work can be spent fine-tuning the systems for greater cost-effectiveness, matching resources to business needs, and continuously improving performance as measured against the SLAs.

SLAs are a best-practices approach to data protection, and a strategic way for service providers and internal teams to prove the value of their services.

Through a well-crafted SLA, data protection teams can move from a reactive stance, where their services are largely invisible, to a proactive stance, with specific promises and regularly communicated proof of delivery.

Drake Pruitt is CEO of data protection specialist, Bocada.

www.bocada.com

This article first appeared in ITNOWextra November 2006.