Pratik Daga FBCS, Software Lead at Asana, explains the changes needed in IT culture and the shifts in processes and procedures that will allow strategic data-driven goals to be achieved.
Let me ask you a question: what do you think will have the greatest impact on your career over the next five years? My money and career are firmly staked on the relationship between me and my organisation’s data. I’m also going to go out on a very short limb and predict that if I fail to navigate the coming changes, I won’t have much of a career left. I know this sounds dramatic, but my prediction is based on data.
By 2026, Gartner predicts that over 80% of enterprises will have used generative artificial intelligence (GenAI) Application Programming Interfaces (APIs) and models and/or deployed GenAI-enabled applications in production environments, up from less than 5% in early 2023. Those GenAI apps will devour oceans of data. With every passing week, organisations of every size will come to rely more and more on the analytics produced by machine learning, not only to drive decision making but to drive everything that can be controlled by data.
As machine learning (ML) generated code takes away more of your coding work, you have a choice: become an engineer who knows how to derive business value from all that data, or… I honestly don’t know. Be an influencer, maybe? In the words of architect and futurist R. Buckminster Fuller, ‘we are called to be architects of the future, not its victims.’
Data-driven architecture for the data-driven business
If event-driven architecture (EDA) is a software architecture paradigm promoting the production, detection and consumption of, and reaction to events, then data-driven architecture (DDA) is a software architecture paradigm promoting the production, distribution and consumption of data and data products.
Simply put, data shows that organisations who use data insights best tend to fare the best in the marketplace by leveraging data and analytics to drive strategic and operational decision making. And for most organisations, that is just not happening:
- A NewVantage Partners survey in 2021 found that only 38% of companies described themselves as data-driven, down from 46% in 2020
- A 2021 Accenture survey found that 67% of organisations struggle to build trust in data and analytics within their businesses
If we add ML or AI to the mix, the results are no better:
- On the upside, adoption of AI grew from 4% of companies in 2020 to 26% in 2022 according to Gartner, showing progress
- On the downside, according to Gartner surveys, around half of big data and AI projects fail or don't get deployed, indicating many companies struggle with adoption
What are the causes of failure?
The best research I have yet seen dates from a 2021 paper titled, Beyond the Hype: Why Do Data-Driven Projects Fail? by Ermakova et al. Together with previous research into all causes of failure, their in-depth investigation identified three primary causes:
- 54% of respondents see a conceptual gap between business strategies and the implementation of analytics solutions
- Low data quality
- Data access problems
It’s worth noting that the primary cause listed above is also the primary cause identified in the failure of large digital transformation efforts.
Increasing the value of data
Just as that concise definition of EDA expands into something quite a bit larger when put into practice, so too does DDA. If we put DDA together with current business drivers — the ‘why’ of it — then:
- Data-driven architecture (DDA) is a software architecture paradigm whose ultimate goal is to increase the value of data by promoting, supporting, and enabling activities in two main categories:
- Data governance: identification, management, ownership, access and value
- Value realisation: quality, security, productisation, availability, usability, use, timely and meaningful analysis, machine learning
This has to be one of the biggest asks of IT. Ever.
In order to get from the current state to the data-driven state, some bridges will need to be built — between business units, between business and technical and data teams, and between data and people.
Culture, democratisation, operating model
The most important prerequisite for successful DDA is to embrace data culture. Such a culture believes in realising the value of data and encourages the use and understanding of it to improve decision making at all levels, including engineering.
Democratisation of data refers to enabling easy, widespread access to data and analytics capabilities for business and technical teams throughout an organisation. Key steps to achieving this are making data discoverable, providing quality lineage and metadata to improve utility and trust and simplifying data pipelines. Unless you train more than just a few key people on analytical tools, and include training to develop the skills that empower them to work with data, you are unlikely to achieve data-driven nirvana.
Domain-based fusion/cross-functional teams comprise: SMEs who understand the business data; engineers who produce, build, and automate with the data; data scientists and analysts who make datasets; and analytics to power business.
Getting rid of the Enterprise Management Bus
You will also need to address the primary cause of data-driven project failure, namely: ‘54% of respondents see a conceptual gap between business strategies and the implementation of analytics solutions.’ The two most important data pipelines you will build will:
- Move vital information from the top down by ensuring that business and project teams have clear strategic goals, objectives and key results (OKRs) and metrics
- Add direct, bottom-up communication between project teams and leadership
Project managers reporting to portfolio managers, reporting to CIOs, reporting to business leadership, reporting to the board — aka the Enterprise Management Bus — is an outdated legacy due for a replacement.
Data-driven architecture: how-to
The core components are not different from what you may currently have; what does need to change will be driven by the business goals.
Data as product-in-demand
As an architect or senior engineer, your goals should be to produce, process and handle high volumes of data and distribute it to wherever the demand is. Think of your role as more of a global air traffic architect. This may cause some tension between the demand-driven and domain-driven forces within your organisation. It might be helpful to adopt an entrepreneurial perspective here. In the startup world, product-to-market fit is everything; the only way to know what the market wants is to talk to them. Fortunately, this aligns nicely with the cultural and operating model changes required for success.
Just as airline passengers are issued a ticket which must be presented at security, the boarding gate, to flight attendants, and at customs and immigration, data will now need something very much like a ticket. From the moment an API request leaves the browser it should be paired with a metadata payload.
Every request, response, event, and message should contain metadata — not necessarily schema, but context; information that describes the body data's origin, purpose, and contents.
For example, purpose:new_player_reg; classification: PII or GDPR or PCI, and so on.
Anything that could be used for automation, routing, or ML training and inference should be included until the data reaches every possible destination, including data persistence. Do not waste one second counting the cost of 300 extra bytes per request. IP packets have headers; HTTP requests/responses have headers; headers work. Do what works.
Be part of something bigger, join BCS, The Chartered Institute for IT.
This one addition alone could go a long way towards solving your organisation’s data trustworthiness issues (if you include metadata & history in the database). And, if you’re with me so far, ask yourself this: if everything coming into production has this metadata and history and then needs to be published to an event stream just after changing the database, how much logging and tracing is needed? What is the impact on observability? What are the opportunities for real-time analytics?
This is one example of how a ‘data-driven’ approach changes architecture.
Data driven architecture goals
The high-level goals of DDA were broadly specified at the beginning of the article, but your initial technical focus for a DDA should be on reducing complexity and using data and machine learning for workflow or process automation.
Identify workflow/process automation candidates
Business and technical data, and analytics, will identify the highest value automation opportunities, guide implementation, quantify benefits, and provide oversight. This leads to more intelligent, adaptive workflow automation:
- Analytics can identify workflow or process pain points: cycle times, bottlenecks and exceptions, highlighting areas needing automation
- Business data-driven process mining (particularly from legacy systems) reveals automation candidates
- Analysing event logs can map processes and tasks with automation potential
- Data proves the ROI, efficiency gains, and impact (business and technical) of automation
- Analytics-defined workflows can be more adaptable, but rule-based automation is just a first step
- Adopt machine learning
Your production and operational data possess untapped power. ML-driven, real-time decision logic can adapt workflows and processes to real-time changes in data. To do that, DDA can deliver the benefits of machine learning to organisations in a number of ways:
- Collecting and storing the data that is needed to train ML models
- Managing the data that is needed to run ML models
- Integrating ML models into existing applications
- Ensuring data quality by
- Monitoring data quality in automated flows
- Validating data integrity through automation
The notification system example below is an excellent example of an evolving DDA, and the delivery time optimisation feature specifically illustrates how data, analytics, and machine learning can be used to increase the effectiveness of your notifications and improve customer experience.
Data driven architecture example: notifications
To illustrate this future architecture and to help us create a road map that will get us there, let’s consider a notifications system, or Nx, for short. Why notifications? They provide immediately visible impact. You can see the ROI within hours of launching your first campaign. Before working at LinkedIn and Asana, I was involved in a greenfield notifications project. Having worked on a similar system at LinkedIn, and now, as I’m building a powerful notifications system at Asana, I’d like to share what I’ve learned.
Notifications as heavy lifter
Our notification system does a lot of important work for Asana. It touches every aspect of the business and the architecture: customers, internal teams, production data and services, our omnichannel system of engagement, and critical business analytics.
The notification system has four stages: creation, orchestration, delivery, and analysis. Each of the four stages comprises a sequence of steps, and these steps are a sequence of tasks.
The architecture of the notification system is highly compatible with any kind of implementation, from on-premises bare metal, to Kubernetes (k8s) in the cloud, to completely cloud-native product stacks on AWS, Azure, or GCP.
Communication between the stages is handled by messages and, in the case of particular steps, by workflow pipelines or by API requests to microservices or 3rd party Software as-a-Service (SaaS).
Do not forget that without the ability to track the complete user journey, your data value realisation goals die after the email is sent. You want to show business ROI, so you must track customer behavior from start to finish: delivery, email opened, link clicked, call-to-action, page viewed, accept/decline, payment received, and so on — all sent as events via websocket or API calls, including metadata. These are critical business data points, best handled by your event ingestion infrastructure, with the stream being distributed to all who want it: the notification dashboard and report screens, DevOps, developers, MLOps, the D&A lake/warehouse teams, and ML training data consumers.
Delivery time optimisation
The importance of delivering notifications when your customer is most likely to interact with them cannot be overstated. Delivery time optimisation (DTO) is based on the hypothesis that the notification response rate can be increased depending on delivery time. As every good scientist knows, there is always the null hypothesis lurking in the shadows: there is no significant difference in response rate depending on delivery time. The following approaches are recommended:
- Ask users about their preferred notification times and use this as the first training dataset
- Conduct experiments with different delivery times for users without a preference and measure their response rates
- Explore the data to find features and clusters that are correlated with delivery times
- Use the cluster centroids to assign optimal delivery times to each user based on their similarity
- If no clusters are found, either collect more data or accept the null hypothesis
That’s it in a nutshell. Perhaps you can begin to see why trying to achieve the goals of DDA is one of the biggest asks of IT. Ever. You can also see now how important it is to embrace the values of data culture. Everyone — business and technical — needs to level up their data game.
I’ll end the article with some consideration of the analytics stage. In many ways, it is the most important, as it will guide us in the next set of business and technical decisions we will make. A conceptual view of the entire stage includes everything from the delivered notification to three typical uses of the collected data: without the collection of data at every possible step in the customer journey, as they interact with the notification from start to all possible finishes, the DDA cannot prove impact and ROI. The event data and the collections are quite straightforward.
We have explored the critical role of DDA in the evolving landscape of organisations. The transition towards a data-centric approach is driven by the growing importance of data and analytics, particularly in the context of machine learning and AI.
We have identified key challenges, including the disconnect between business strategies and analytics solutions, data quality issues, and accessibility constraints. These hurdles underscore the need for organisations to undertake a comprehensive transformation to embrace DDA fully.
To succeed in this data-driven future, we have outlined several essential steps:
- Cultivate data culture: a fundamental shift towards a data culture is imperative. Organisations must prioritise realising data's value and encourage its widespread use across all levels of the business
- Democratise data: easy access to data and analytics capabilities should be extended throughout the organisation. Making data discoverable, providing high-quality lineage and metadata, and empowering employees with data skills are crucial
- Alignment with business goals: DDA should align closely with organisational objectives. Reducing complexity and harnessing data and machine learning for process automation are vital aspects of this alignment
- Embrace machine learning: leveraging machine learning to harness the power of data is essential. This involves collecting, managing, and integrating data necessary for ML models to drive informed decisions
Taking these steps will enable organisations to stay competitive by making data-informed decisions in an increasingly data-centric business landscape.