Data migration - the pervasive cloud

Had an interesting meeting with Jerrod Gladden the other day. Jerrod is Pervasive Software’s Director of Sales for Integration Products.

For those unfamiliar with their product set, Pervasive offer the complete range of tools needed for a sophisticated data migration - they have a profiler called (imaginatively enough) ’Pervasive Data Profiler’. They have a sensational data integration tool called (you probably guessed it) ’Pervasive Data Integrator’ and finally a de-duplication and consolidation platform called (slightly more originally) ’Pervasive Data MatchMerge’. A complete solution then, but aside from the ease of being able to recognise the product from the label, what is it that sets these offerings apart?

Well it’s all about what happens under the skin. Pervasive have a track record in data that goes back a long way. All the way back to the early PC’s and a product called Btrieve (which we old grey beards may still remember with dewy-eyed nostalgia). After a series of incarnations this is now PSQL and remains the embedded data engine of many other familiar products. Then again even Pervasive Data Integrator has a history spanning back nearly 30 years.

As I have pointed out in the past we have reached a stage in the evolution of commercial IT when really knowing how to manage data has become important once again. It seems like a whole generation of IT staff has been riding on a constant upward curve of ever cheaper, ever more powerful hardware that has offset the need to be efficient. Those old timers like me who remember the days of sweating the nightly batch run to get it through in its allotted seven hours are having a deja vu experience, as the huge amounts of data we now expected to munch our way through, wrecks our planning assumptions. Clunky, single threaded, SQL based sorts and queries just do not cut it. The solution of throwing more iron at it is no longer working for us. We need smarter, leaner, faster and more efficient tools to do the job for us. Hence the rise of Hadoop, Map Reduce etc.

Hidden away (well not so hidden, given that it is plastered all over their marketing materials) within Pervasive’s tools is their ’DataRush’ technology. This provides a built-in, low cost, parallel architecture so that the adding of processing throughput by including more, cheap, utility hardware nodes is automatically accommodated. This of course goes beyond big data, with which we are all familiar. Big data is a whole area on its own, but even in my backwater of data migration, we are increasingly dealing with what I might term ’bigger data’. We all have websites and we are all looking to analyse what all those hits might be telling us. With the breaking down of the links in the old supply chains we all have a lot more contact with end customers than maybe we did before. It is not at all unusual for even a medium-sized enterprise to have knowledge about customer contacts that runs into the many millions. And when we come to migrate them, all these records need to be profiled, de-duped and cleaned prior to migration.

I will not dwell here on how this data rush technology has been adapted to Hadoop and its ecosystem. As I say I’m the data migration guy and that is not really my thing. But it is easy to see where a relatively mature technology, like that provided by Pervasive, would play well here - linked as it is to the other fully functional tools in the chain. I’m going to stick to my knitting and talk about another service of Pervasive’s that I really would like the opportunity of trying out in earnest. And that is their cloud integration. It has been obvious from as soon as the cloud formed that this could be the ideal solution for two of our perennial data migration resource issues - accessing appropriate software and accessing sufficient physical resources. It remains the case that the majority of enterprise application data migrations for the majority of clients, even large ones, are once in a business lifetime activities. We simply do not perform the whole scale replacement of our enterprise application infrastructure more often than once every fifteen or so years. The whole process from initial proposal through business case, through market testing and into procurement, development, delivery and realising benefit takes too long. Although every organisation benefits from a technology refresh every decade or so, none would prosper under the disruption of constant change. And if there is a fifteen year or so cycle then those resources, technological as well as human that did the job last time, will have long since been re-deployed.

There are of course exceptions. Some very large organisations are constantly refreshing technology in some division somewhere in the world, others are for ever on-boarding new acquisitions. But most company business models are just not like that. Come the day, of course, when that technology change occurs, each organisation has a sudden but temporary demand for additional hardware and software (and data migration skills, which is where we come in but that is not the subject of this blog). Here, a fully-fledged cloud-based subscription service is ideal. The latest software, integrated and configured with consuls available anywhere that the web can reach and mini integration engines available anywhere the web can reach. Seems like heaven in the clouds to me.

There are some issues I would need to understand of course - the biggest being data protection, especially within the EU. Where is this data stored? How can we guarantee that it is secure to the levels legally required? Secondly, understanding the costs - just how much do we pay in the subscription model? How efficient is cloud-based software when the particular source we are looking at is on premise? But on the whole I think this is the best attempt that I have seen so far at bringing together the elements needed for a single cloud-based platform whether the end points are cloud-based, on-premise or (as is increasingly the case) a mix of multiple points, some in the cloud, some on premise, some on other parties’ premises.

There is a lot more that can be said about Pervasive and maybe I’ll return to them in a later blog, but I must mention their Galaxy marketplace. Yes, Pervasive has joined the other data software and service vendors out there by creating their own community on the web. Galaxy is the name they have for it and it allows all the swapping of templates, connectors and apps that we have come to expect. For an organisation as long standing as Pervasive this is maybe not before time but a welcome development nonetheless.

However, we do have to address the elephant in the room. This is Pervasive’s impending takeover by Actian. Now Actian are an organisation about which I know very little aside from their involvement with Ingres which as a database would seem to have some overlap with PSQL. Their website boasts of the power of Vectorwise, an analytical database targeting apparently the same piece of turf as DataRush for Hive or so it seems to me. It could be of course that there is some synergy in the use of DataRush paralleling technology within Vectorwise. However these are not the tools that interest me as a data migration guy but in any takeover there is always the worry about ongoing investment in the purchased platform (less of an issue if you take an instrumental, just for this migration, view of the Pervasive portfolio in the cloud option mind you). We shall have to see how this plays out. It is always hard from the outside to know for certain where the perceived ROI for the purchaser is and, in the fallout from the integration of the two management teams, where the power will lie.