Why we should link data migration to data governance.

I have always been steadfast in claiming that Data Migration should have its own niche in the panoply of disciplines that make up our technology universe. So am I about to perform a vaulte face? Well no I'm not, however I am being increasingly drawn into looking at the relationship between the skills and techniques needed for Data Migration and those of the linked disciplines of Data Governance and especially Data Quality and Master Data Management.

As usual, these things tend to creep up on one. It just seems that suddenly I look around and find that a number of different activities all seem to be pointing in the same direction. There has been some kind of subtle shift in perceptions and where I was once a lone voice crying in the wilderness (OK I exaggerate a little) I am now being actively engaged to link these disciplines together.

I recently told you about the webinars for Data Quality Pro, where I am talking about the central role Data Quality plays in Data Migration. But, of course, the relationship between Data Quality and Data Migration is a two-way street. On the one hand, we borrow tools and techniques from the Data Quality guys (their Profiling and Data Quality software for instance), but they should also leverage the work we do to kick-start their Data Governance activities.

We have an advantage and a matching disadvantage when it comes to Data Quality in Data Migration. On the one hand we have the massive advantage of the compelling event of a major system change to concentrate minds and secure budget. As I always say, our conversations with our business colleagues start with the question ‘In n months’ time we are switching off this system upon which your whole business life depends. How will you be sure that you will be able to continue doing your job when that happens?’ not with any techno-babble about data mapping. So we get attention.

On the other hand the very proximity of a new system go live means that we do not have time to fix every defect we find. We have to prioritise our activity ruthlessly, skewed towards an on time go live. (This is of course what the PDM Data Quality Rules process does.) Typically at the end of our project we have in our hands a fairly comprehensive list of defects that we did not have time to fix. We can also pass on, to anyone interested, a keen and eager virtual team of Business Domain Experts and Technical Systems Experts who have taken the DQR process from PDM and moulded it into something that works in their environment.

I can’t begin to tell you the frustration that comes from having this valuable hand over item and then having to leave it on the table with no one to pick it up. But in the words of the old Bob Dylan song “The times they are a changin’”. I am starting to see clients coming forward wanting to engage in with us and I would urge everyone else who has a nascent Data Quality initiative out there to do the same. Better still, of course, is to get involved earlier. Don’t be too parochial about these things. And there’s the rub.

Too often there is a Data Governance team in waiting. But they are on a different budget, working to a different time table, following a different guru with a different approach. My plea is that we should look at the end point not the means. We all want good data. Our PDM methodology is open to having other techniques dropped into it if they conform to local policies. However it is gratifying to see that we are maturing as an industry. On two occasions in the last month clients have actively approached me to do just that.

The territorial protectionism that marks the response of the Data Quality guys is multiplied ten fold when it comes to Master Data Management. Within a Data Migration we are nearly always confronted as a matter of practical necessity with the need to combine multiple versions of some of our key entities - typically customer and product - from the plethora of systems that make up a legacy estate.

This problem comes on two levels. First there is the semantic issues - what do we mean by customer? Is it the supply point? The division of the company we supply to? The trading style? The holding company? The legal entity? Does a prospect that is yet to purchase count as a customer? Anyone who has been through this process is familiar with these issues. And the same is true for product or account or vendor.

Once we have resolved the semantic issues we then have to go back to our prospective data sources to check the how each of those map onto our new taxonomy. We have to split, combine, de-duplicate, enrich and so on.

Within PDMv2 there is a sub-module for all this in Landscape Analysis. However like the nineteenth century ‘Love that dare not speak its name’, we can never use the phrase ‘Master Data Management’ for the activity we are engaged with. To be fair, what we are doing is probably best referred to as ‘MDM Lite’.

We are not creating federated data stores or distributing values or anything high tech like that, although we are, in a cobbled-together belts and braces manner, collecting, aggregating, matching, consolidating and quality assuring the data items. (I took that list of definitions from Wikipedia). But we are never allowed to call it MDM on-site. More often we refer to it as Reference Data Mastering or some such euphemism.

But why? If there is an MDM for the item we need to control then let us tap into it. If there is a project to create such a thing then let’s work together. Remember we are merely passing through. At the end of our project we will pack out bags and be off to the next site. What remains is what you have to deal with.

Again though it appears that the tide is turning (I sometimes think only old king Canute saw the tide turning more often than I have). At last I am getting positive responses from MDM guys looking to capitalise on our decades of experience in confronting the semantic and Data Quality issues that are essentially the same if you are building a temporary reference data management solution for a Data Migration project or a permanent MDM hub. I am also discussing with a leading software vendor a joint white paper on these initial definitional, preparation and load issues, which, if they are not resolved will greatly diminish or even destroy, the value of your MDM investment.

Johny Morris, jmorris@iergo.co.uk