Balancing data quality with business need, tight deadlines and other project imperatives in Data Migration.

Kevin has rightly reminded us, in response to last week's blog, that we need to pay diligent attention to the issue of business requirements. Mere technical data quality is not enough. And also that the term "Data Quality" is itself loaded. Data may have been of fine data quality when created but is now obsolete or redundant. For this reason I prefer to use the term "Data Preparation" to "Data Cleanse" - no one likes their data to be defined as "Unclean".

I guess what I am saying is that in data migration we are always engaged in a multidimensional activity. We can divide the data prep task anyway we choose but:

  • If we are not meeting the business needs
  • If we are not hitting out time target
  • If the new system does not work because of incorrectly structured start up data

then we fail.

We need to balance between all these imperatives. What we need is a structured approach that promotes this balance.

In my approach, reflected in the book (available from this site, and well worth the money) I create just such a set of interlocking structured activities but I'm interested, through this blog, in hearing other voices, seeing what has worked (or not) on other projects. Not much point, it seems to me, just to reiterate what's in the book:-)

So come on Kevin, Peter, Deborah, Girish, Mike, Vikas, Almosaed, Nigel, Remy, Kenny and everyone else who's reading this but not yet commented let's have your suggestions.

The challenge is to balance the three imperatives with the need to get the most appropriate data, sufficiently prepared. Let's focus on the second phase of data issues in my previous paradigm - data in legacy systems. How do we make sure that it is as well structured as possible before we throw it at the new system?

What techniques have you used? What problems have you encountered? How have these been overcome? It could be that you prefer to wait until the new system is available before commencing data discovery. If so, what are the benefits as you see it? I know some ETL vendors recommend trial migration with real data as early and as often as possible, but I have a number of reservations about that approach. What do you think?

Or if you are just struggling with these problems and have no answers please share that with us also.

Enough of the polite engagement. This is a largely neglected area for us to map out. Let's get the debate started!

Johny