Everyone knows that data volumes are increasing at enormous rates but is knowledge of the data, and more importantly knowledge within the business, increasing with it? Jason Tiret examines the best practices on how data models are used to better service enterprise data management.

In today's world of data governance, web services, regulatory compliance and heightened information security, data architects are asked to build much more than the classic data dictionary. The importance of well-documented models, both data and process, are at a premium. The traditional entity and attribute definitions are not cutting it when it comes to truly documenting the data and the processes surrounding it.

As new projects are undertaken, ensuring that business requirements are adequately addressed and accurately implemented can be a challenge unless the metadata (i.e. the data about the data) has kept apace with the growing needs of the business, continually evolving alongside it.

The importance of data governance

Data governance is becoming ever important to businesses as they strive to meet the needs of regulatory compliance measures such as Sarbanes-Oxley, Basel II and most recently MiFID.

However, very little of the data that is stored within a corporation is actually used to its benefit. Gartner estimates that only 15 per cent of data is actually used for the benefit of the organisation - nobody knows what the 85 per cent of data is, where it is or what to do with it. How happy would executives in your business feel if you told them 85 per cent of your data was unusable and just taking up disk space?

Data governance entails many things but the basic premise is a set of standards or guidelines for managing data on an enterprise-wide scale with the goal of making it more useful, more secure and more valuable - i.e. turning a storage cost into an asset for the business. By ensuring best practice around data within a company you can automatically drive down data centre costs and gradually begin to utilise that 'missing' 85 per cent.

The scope of data governance extends beyond the data architecture team and it is very important that both the architects and modellers are involved with the data governance initiatives to ensure the business is aligned correctly. This means creating standards for how your data is secured and documenting what, if any, sensitivity to compliance laws it may have.

It means defining the stewards of your data as it relates to responsibilities of managing it, such as the quality, design and business rules. It also means creating standards for database development as new databases are built and existing databases are re-architected. It is critical that these standards be integrated into the models to service the data governance initiatives of your business.

A general definition of an entity in a typical data model very rarely documents the sensitivity level of the data it represents, the use of the data on an enterprise or departmental level, the last time the represented data was checked for accuracy, or the last time the structure was changed in the database.

Most organisations are just happy that an entity has a definition at all. Nevertheless, this information needs to be incorporated into the models, otherwise it will just become yet another outdated artifact that IT needs to manage, with no tangible benefits to demonstrate to management.

The latest technological trends

Both web services and service-oriented architecture (SOA) are two hot technologies that are currently on most business's radars. SOA enables the integration and reuse of data throughout the enterprise, which can, amongst other things, speed up processes and increase data sharing across the organisation.

This can often improve efficiency across various business areas such as customer service and technical support call centers or sales, order management and accounting. A big component of SOA and web services is XML and XML schemas that represent the data and structure in a message. These, like everything else representing the structure of data, need to be governed.

Many organisations are actually using data models as the origin of XML schemas. This makes sense because they can use the same set of standards that are applied to physical data models and databases and leverage them for creating the XML schema structure. This often starts with creating logical models that represent the canonical form of the XML messages.

A canonical model will typically be somewhere in between a conceptual and a logical model but will be fully attributed and enforce stricter vocabulary and stronger typing for the attributes. The benefit is that the same vocabulary and naming standards can be used for the XML as it is to create databases that are typically where the data originates anyway.

A safe repository

The importance of storing the data models in a repository as opposed to a network drive cannot be understated. Models represent a large part of the intellectual property of a business. The worst thing that can happen is to have them stored on personal hard drive or network drives with no process for backup and recovery, no ability to analyse what the sum of the parts is, and no way of knowing what is truly out there.

It gets very difficult to align the information about IT assets with the knowledge and rules of the business. Getting them into one central container maximises the benefit they can provide to an organisation. This can help isolate areas of redundant data and reduce the overall cost of storage for the data. In addition, most repositories have the ability to reuse information across various models to promote reuse and further drive down the cost of managing common data in systems throughout the organisation.

Reporting is also an integral piece of any repository. This allows searching and reporting to audiences who may not be leveraging the repository for active development but need access to gain information about the data's use and whereabouts across the enterprise.

Conclusion

In summary, how data is used underpins the success of an organisation. Data models play an integral role in managing data on an enterprise level but that is only the initial step. Data models need to be well-documented and tell the entire story of the data, who can access what, when, where and why. The data models must also explain the policies and use of the data across the enterprise to ensure governance, security and best practice.