The Data Warehouse Toolkit (3rd edition)

    Ralph Kimball, Margy Ross

    Published by

    Wiley

    ISBN

    9781118530801

    RRP

    £42.50

    Reviewed by

    Patrick Hill CEng MBCS CITP

    Score

    10 out of 10

    Increasingly, data is becoming one of an organisation’s most valuable assets. However, the potential value of this data can only be fully realised if it can be organised in ways that facilitate reporting and mining by a range of consumer types.

    In order to support reporting across different data silos, data is often integrated into data warehouses, intended to provide a 'one-stop shop' for all reporting needs.

    This book describes a principled and pragmatic approach to the organisation of data warehouses using the Kimball Methodology.

    After an introductory orientation to data modelling and the Kimball methodology, chapters 3 to 17 each present case studies focussing on the specifics of different industry types and reporting requirements. By building relevant dimensional models, these chapters bring out the challenges presented by various kinds of data, data relationships and reporting requirements.

    While most of these chapters start from scratch, chapter 10 offers a slightly different perspective by providing an opportunity to review and critique a proposed dimensional model as if stepping into an in-process data modelling exercise.

    These modelling chapters follow a general pattern, which reiterates the importance of early grain declaration and emphasises the use of the bus matrix both in helping to identify relationships between dimensions and applications and as a crucial tool in the development of conformed dimensions and in documenting the data warehouse.

    These chapters also explore strategies for identifying and handling different types of slowly changing dimensions and the effects of different data organisations on reporting capabilities. There are useful general hints, anti-patterns and heuristics embedded in each chapter.

    The latter chapters of the book move away from dimensional modelling and focus on the key aspects of the Kimball BI/DW life cycle itself. A useful introductory chapter describes the overall life cycle and principal pitfalls. Subsequent chapters delve deeper into the running of the data modelling process, including the identification of key people to input into the process.

    There are pointers to the authors’ website for additional resources, such as document templates. The final few chapters include a detailed description of a prototypical extract/ transform/ load  process and a guide to the ETL development process, including a brief discussion of real-time ETL. The book concludes with an outline of the state-of-the-art regarding DW/BI and big data.

    This book is written in a very approachable and readable style. While the book is intended to be read as a whole, bullet-point chapter highlights and chapter summaries as well as detailed contents and indexing make the book easy to use as a reference text. In addition to a range of design patterns covering a multitude of scenarios, there is lots of practical advice based upon the authors’ extensive experience in the field.

    Further information: Wiley

    December 2013