We all (well within the UK) know Experian. And let us be certain that this major rating agency knows about each of us, with its huge data banks of our address, credit history and purchasing preferences. (I have to say that I am always slightly daunted by a party that may know more about me than I know about myself). However, the fact that they have a reputation un-besmirched by the odium some of their competitors labour under suggests that they have been more careful to stick to their knitting rather than venture out into the crazy world of esoteric high finance instruments.
Before we go any further I have to confess that I almost worked (or maybe I did work) for them in the late 80’s. Please allow me to explain. I was employed by a small, local, credit rating agency in Manchester, England called ‘The Manchester Society for the Protection of Traders and Merchants’ or some such long, convoluted, Victorian name (at this distance my memory is somewhat patchy). The Manchester (as it was known) was taken over by Experian and we were offered redundancy or relocation to their Nottingham headquarters. I chose the bailout option and set up on my own using the redundancy pay off (which I remember as being quite generous) as my working capital. The rest as they say is history. I cannot remember if my parachute opened before or after the sale of The Manchester was completed but this is now quite academic.
However I shall try not to allow the memories of this experience to colour what follows.
So back to Experian in 2013. As I indicated in the first paragraph they are best known for the essential, if essentially unglamorous, world of credit rating, both personal and commercial. This has given them a huge wealth of experience in name and address management in particular and data governance in general. It was with the intention of bringing this experience to market that they acquired QAS, a software services shop, and have worked to develop both software and services that target, amongst other things, the data migration market place.
The software first. The product boasts data profiling, data management and data analysis all in one package. Although it has the ability to perform mappings to target with selection, exclusion, transformation and enrichment, it lacks the full features of an industrial strength ETL tool (things like throttling, event driven migration, built in unit of migration concepts, workflow management etc.). This then is not an ETL tool for major migrations but it may be enough to get you over the line for a small migration. On the other hand, these days we are normally confronted by a client-systems integrator (SI) relationship where the final mile is accomplished by an ETL tool in the hands of the SI. The QAS software would perform admirably in the hands of the client to prepare the data for migration to a template defined by the SI. (For those familiar with PDMv2 methodology this is a classic example of the demilitarised zone or DMZ). The mapping is performed using the classic, left to right, drag and drop, multi-window user interface with multiple widgets available and the ability to create additional widgets for special processing. Given the Experian heritage QAS also have to hand a legacy of tools like Mosaic for address profiling and enrichment.
The ability to produce an output, working on full data loads, allows a degree of data prototyping as well.
Now let us concentrate on the tool’s strengths. Taking a full load of client data, the software performs a pre-profile and loads the results into a repository upon which the analysis is then performed. Out of the box it has all the profiling features you would expect of a sophisticated analysis tool. It performs column analysis on fill rates, pattern matches, data types, uniqueness constraints etc. It allows you to zoom into columns, filtering and viewing values in fields. You can then highlight values and you are only a couple of clicks away from creating new filters.
And it is fast. Unlike packages that create and run code against native tables, pre-profiling speeds up the queries. QAS were quoting examples of seven million records being loaded and profiled in two minutes and drill downs are equally fast.
It also performs relationship analysis based on matching values - so you can find the hidden relationships unknown to the DBMS.
To reduce the risk of being swamped in results you can select certain tables or schema’s to perform analysis upon. The chances of making the right choices when restricting the tables depends of course on knowledge external to the tool and for this you need a surrounding methodology but more of that later.
And so we come to their methodology.
One of the interesting features of the methodology is the ability to score the value of the data by column on a range of one to 10 within the product. This then helps in sorting the wheat from the chaff but, again, is dependent on a business side view as well as a technical view of value. The technical folks will not necessarily know of the local use of fields to hold data that has nowhere else to go, and the business folks will not necessarily recognise the field names from what they see on the screen. So a degree of collaboration is required. Of course this product allows for this close collaboration but you have to make space for it in your data quality/data migration workflow.
Using this tool should commence with collaborative profiling and data analysis. (Well it should, of course, begin with a project requirement statement and plan but let us take that as read). Issues raised then need to be resolved. Here the narrative is crying out for something like the PDM data quality rules process. In a data migration environment just kicking out the errors and leaving them to be picked up and managed by someone in the business unsupported and unguided is asking for delays that may be fatal. The QAS approach recognises this.
Here I certainly agree with QAS’s aspirations. Those of you who have read this blog for a while will know that I am always arguing for the continuation of the work we put in preparing data for migration into the in-life phase of the target. This is especially true of all our efforts in identifying data that we did not have time to fix and building up good relationships with our business colleagues which would get that data fixed after the migration if we did but carry on.
So all in all a fine piece of software that will get you from ignorance of the horrors that await you under the hood of your legacy software to data ready to be passed with confidence into the waiting maw of your migration partner’s software for the final mile.
Twitter - @johnymorris