Data migration - the magic roundabout

How to combine iteration with waterfall developments to manage the unplannable aspects of data migrations.

In the last two blogs I looked first at why detailed planning of a data migration is probably not possible. There are just too many unknowns. I then looked at how to manage the unpredictable data quality issues we are going to encounter and manage them down to a level where we can be confident that we will hit target and we will hit deadlines.

This time out I’ll be looking at a technique for controlling the other great variable - the target system data requirements or, more particularly, the target system’s late and contradictory delivery of data requirements.

First though a step back to look at the issues modern procurement practices bequeath to the projects they spawn.

I have remarked enough in the past about the problems of modern procurement processes. Indeed they figured again in the first blog in this series. But suffice to say that the because of issues around fixed price bids and tight time scales, production of something against which we can deliver data migration code is now, all too often, pushed down the timeline from the detailed design to the build phase.

I will use the example of a generic SAP project and their recommended ASAP methodology to illustrate what I am on about.

ASAP has the following steps:

Project preparation -> business blueprint -> realisation -> final preparation -> go live support -> operate

For those who haven’t worked on SAP projects this is all fairly standard waterfall stuff. We have project kick off in the prep stage, requirements gathering in blueprint, detailed design in realisation and technical prep and delivery in final prep. Go live and operate speak for themselves.

Now we expect that we data migration folk will start our landscape analysis, setting up our DQR Board, kicking off our system retirement plans and onboarding our key data stakeholders in project prep. We will be getting our first taste of likely data gaps during blueprinting and getting delivery of the full data requirements at the end of realisation.

But here’s the thing: Given the squeeze on time and budgets in modern system replacement projects, often the really detailed data design of things like drop down list values and the delivery of custom code, comes during the final preparation. This means not getting the full data story until go live commences which is way too late to deliver a coherent, tested, set of ETL tools if we have to start from scratch.

But can we do a better job if we manage the data migration build iteratively? Given that most organisations have 60-80 per cent similar processes - HR, finance, procurement, CRM and payment collection (to an extent), these need lighter tailoring than the more business domain specific aspects - production, parts of the sales cycle, service and product delivery and the data requirements design will be available earlier. Some (like CRM) will be so industry standard that work can commence on them during business blueprint.

My suggested approach therefore is to start early but with iterations agreed across the programme of coherent segments of the total data requirements that will build into a complete ETL. Each iteration being complete in itself, its content specified in advance and time boxed.

This has a number of benefits. It provides the ETL team the opportunity to shake out the issues in their internal build-test-execute cycles. It delivers working code earlier and it has spin off benefits for other work streams (testing and training principally) who are also dependant on the delivery of stable platforms (with data).

Now this of course leaves plenty of room for rework to appear. We may have to revisit the CRM once the peculiarities of the organisation specific sales cycles have been worked through to add additional fields. Often we need to rework aspects of both outgoing and incoming payments processing to accommodate the special aspects of our environment. However the use of appropriate ETL tools means we can get maximum code re-use.

The idea of a coherent data set is significant. It means that the data provided is useable in a version of the target as available at a given point in time even if all the necessary functionality for working the business is not.

Each new release is also coherent and can be checked with its predecessor for differences and this is where our attention is focused.

It allows other work streams to plan their activity, knowing into which release particular gaps and faults will fall.

After the first couple of iterations where the ETL build process itself is being tuned, it allows for accurate predictions of the time and effort it will take to complete an ETL build cycle based on metrics like the number of new fields, the number of reworked fields and the number of new objects / entities. This allows planning using the migration readiness metrics discussed in the last blog.

This in turn helps us all make proper judgements at the business end of a project where we often have to prioritise fixes and enhancements against a desired end date. We know how long it will take from making a change in the target to having it populated with living data.

All of which protects us from the blistering whirl of chaotic uncontrolled change into which many projects fall.

Is it easy to implement? Well yes and no. Technically it presents a few problems like working out how many instances of the target we need and this can be a challenge to the technology provisioning teams. There are also issues with controlling the (ever) moving target so we know which data build matches which target release. We also need to have a firm control on our own data builds.

Politically and contractually it can be more fraught. Working to controlled releases restricts the ability of the supplier to change the configuration of the target in the dynamic manner which suits their need to get bug fixes implemented quickly (of course this begs the question of whether the bug is fixed if there is no go live data for it). Daily (or even hourly) releases of the target are possible on the development box but have to be bundled into data requirement releases.

So it is a solution that works but requires discipline across the project. Not necessarily an easy thing to ensure.

There is at least one other data delivery possibility that is still waterfall in nature but involves close inter team working during the requirements gathering / initial design phases (business blueprint and realisation in ASAP). Then, of course, there is full on agile but I am coming to the end of this summer mini-series. I will definitely return to the subject of agile in the autumn.

For now I would like to thank those of you who have contributed your questions and comments. Please keep them coming.

Look forward to hearing from you.

Follow me on Twitter: @johnymorris #PDMv2