For the last decade, digital transformation has been about modernising systems, redesigning journeys and moving services into channels that are cheaper, faster or more convenient. But the rise of AI has changed the rhythm of transformation entirely. Danny Major explores.
Organisations are no longer just implementing new technologies. They are starting to rethink entire operating models around AI agents that can interact, decide and act on behalf of the business.
The problem is that while ambition has grown, guardrails haven’t kept pace. Most of the AI benchmarks the industry uses today still focus on how quickly a model can code or how well it can play a logic game. They make for good news headlines, but they don’t tell a regulated-industry leader whether an AI agent can explain a policy clause accurately, stay inside a compliance boundary or escalate at the right moment. I view it as the equivalent of hiring a team based solely on their IQ test scores; interesting, but not remotely useful when you need to run a safe, predictable and accountable operation.
This is why the next phase of digital transformation needs something more meaningful. We need industry and domain specific benchmarks that measure not just capability, but suitability. A customer I work with refers to this as ‘the suitability and quality assessment’ — benchmarks that reflect the real environments these systems will be deployed into. That means understanding not just whether an AI is competent, but whether it is safe, reliable, governable and aligned to the regulatory and enterprise operational realities.
Insurance is a prime example. When you introduce AI agents into policy servicing, claims, fraud triage or quote journeys, the stakes are high. Decisions affect real customers, real liability and real regulatory consequences. The transformation pressures are enormous. Contact centres are overwhelmed, cost to serve keeps rising and customers expect digital experiences that mirror the immediacy of the consumer tech we’re all experiencing. AI can solve many of those challenges, but only if organisations trust the technology enough to put it in front of customers and integrate it into core workflows.
For you
Be part of something bigger, join BCS, The Chartered Institute for IT.
Trust cannot be manufactured through marketing. It must be earned through measurement.
That’s why I’m starting to see a shift away from generic performance metrics and towards industry aligned safety frameworks. Organisations want to know how an AI agent handles uncertainty, how it manages domain specific terminology, how it explains itself, how it stays within rules and how resilient it is when interacting with legacy systems. Enterprises want clarity on failure modes, escalation logic and the guardrails that keep experiences within safe operational lanes. Leaders want and need benchmarks that reflect what matters to customers and regulators, not just what impresses an audience.
I’m increasingly convinced that the organisations who adopt domain specific benchmarks early will move faster than those who rely on generic ones. Not because the benchmarks themselves are glamorous, but because they give transformation teams something they have been lacking for years: a common language. Product, compliance, technology, risk and customer operations can finally evaluate AI using shared criteria grounded in the reality of how their business runs. That creates alignment — and alignment accelerates transformation.
Benchmarks also give teams clarity on where to invest. If the benchmark surfaces issues around explainability, you can prioritise audit trails. If it highlights domain gaps, you can strengthen knowledge assets. If it shows weaknesses in handling edge cases, you refine guardrails or human-in-the-loop pathways. Transformation stops being a vague ambition and becomes a measurable, iterative journey.
Take it further
Interested in this and similar topics? Explore BCS' books and courses: