What is the data fabric?

Szymon Klarman, Chief Knowledge Architect at BlackSwan Technologies, explains how enterprises can make the most of their data without the need for a seismic architectural shift.

Within IT, we often consider silos as a negative. But in actual fact, the organisational structure that’s required to enable a company to grow and prosper, inevitably leads to silos. It is these silos that are necessary to ensure certain tasks sit within the auspices of the teams that have the prerequisite skills. After all, we don’t want data engineers determining a company’s operating budget, but we do want sales teams focusing on sales.

With the importance of data across enterprises growing over the last three decades, these silos have subsequently created a knock-on effect: data silos. Data silos lead to inconsistent and undefined processes, disparate or outdated data sources, and data not being used to its full potential, ultimately impacting a company’s bottom line.

To overcome these issues, enterprises have used a centralised approach; a secure and protected repository that is accessible to those who need it. Initially, this consolidation of data would take place within a data warehouse. Users could pre-process and store data in a fixed, structural form for predefined use cases.

But this approach had limitations – not least that it required extensive customisation and maintenance and could not scale. A new type of architecture – a data lake – was introduced to overcome these issues. Data lakes ushered in a new wave of data handling; the ability to allow enterprises to store all of their structured and unstructured data at any scale in a central repository.

These approaches were fit for purpose when they were introduced, but the combination of advances in data-focused technology and increasing complexity of data within enterprises have brought to light their limitations. For instance, organisations that have merged or acquired other enterprises, face the challenge of absorbing their data and practices and integrating them with the rest of the business.

At the same time, each department has its own needs, data priorities and subsets of data. As a result, the centralised environment becomes convoluted, and the ‘single source of truth’ organisations strive to achieve becomes fragmented and unsynchronised.

Integrating new data sources into traditional platforms also tends to be costly, time-consuming, and resource-intensive due to inconsistencies in data formats, a lack of context, and poor data quality. This is made more difficult because the data specialists within the organisation and the department teams are siloed, with a lack of understanding of each other's domains.

While the organisation struggles with these challenges, everyday business goes on, exacerbating the existing data silos simply because each department cannot afford to wait for the necessary back-end changes. This feeds a vicious cycle of data silos leading to organisational politics, which leads to further silos and so forth.

Even in situations where departments share common business processes, the ongoing efforts into centralisation can conflict with innovation at the department level. As a result, departments end up spending valuable development time building custom applications from scratch, often unwittingly duplicating the efforts of other departments.

Introducing the data fabric

The centralised approach has been the go-to data strategy up until now. But to bypass data silos, enterprises require a completely different approach – a decentralised approach to data. At the forefront of this approach is the data fabric, a design concept that Gartner has listed as its top strategic trend for 2022.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Gartner defines the data fabric as an integrated layer of data and connecting processes. It says organisations can reduce the time required for data integration by 30%, deployment by 30%, and maintenance by 70%. The analyst firm adds that a data fabric provides ‘frictionless access and sharing of data in a distributed network environment’. But what does this actually mean?

Rather than consolidating every source of data into a single location, the data fabric approach focuses on establishing a data framework in which each repository of data across the organisation is exposed and can be accessed – with the appropriate permissions – across the organisation, based on a uniform set of interoperability principles.

Data integrity is preserved via data virtualisation, ensuring that data assets are accessible from where they reside and that data duplication does not occur. In other words, datasets can be virtually accessed, viewed and analysed on-demand, without making changes to the original data. This allows teams to effectively pool together the data they need for any one project from various different silos.

How the data fabric works in practice

The data fabric approach ensures a single point of visibility of data flows across the enterprise, which minimises discrepancies between data producers and users. This helps companies to overcome issues with data quality, and the up-to-date data and granularity of access controls enables greater enforcement over data consumption.

In data architecture shifts in the past, enterprises would be required to undergo extensive transformations in order to move from one architecture to another. The data fabric enables enterprises to keep their existing IT assets, with the fabric interoperable with them. That means the data lake and data warehouses that have been built, customised and maintained for many years can become nodes within the data fabric.

What does this mean?

Let’s consider creating a 360-degree view of a customer – a task so challenging that Gartner recently found only 14% of organisations achieved it. When an organisation has grown over time, a particular person or enterprise may have interacted with a number of different business units within the organisation, each of which has a different operational system.

This leads to the data becoming unsynchronised. Added complexity, such as a change in job role or two simultaneous roles being taken on by a person, are often difficult for an organisation to verify in a centralised single customer view. This is largely because traditional centralised approaches are incapable of incorporating unique identifiers from multiple systems, and determining which of these are most accurate.

The ideal data fabric approach would firstly allow organisations to pull data seamlessly – without the need for replication – from a multitude of systems, including those within data lakes and data warehouses – to build customer profiles. It would then ensure that the profiles can be further enriched using metadata and structured and unstructured data from open source intelligence and paid-for sources.

The data fabric would incorporate both the visibility to ensure teams know when data has been shared and duplicated, and the additional unique identifiers to ensure the data used to build the profile is the most accurate.

The data fabric approach provides a 360-degree view of a customer – and this is proven by what else it enables businesses to do with the profiles it builds; you can cross-reference data points and make inferences, and visualise and analyse a profile holder’s relationship network. This enables organisations to more effectively perform deep investigations and comply with regulatory and privacy obligations.

Data fabric can help create composable businesses

The approach offers a side benefit as well. Once the data framework is in place, the development of applications built upon this data can also be homogenised into reusable components. If innovation and development are built upon a common framework, it becomes far simpler to duplicate successes in other departments.

To gain the agility required to get ahead, this capacity to create reusable components is key. It allows organisations to become composable businesses, which are defined by Gartner as ‘an organisation that delivers business outcomes and adapts to the pace of business change’.

Silos may continue to exist in your organisation for decades to come, but provided they are operating on the same data framework, are not delayed by constant efforts at maintaining a centralised source, and can base their development on reusable components, organisational silos do not have to equate to data silos. Instead, the right Data Fabric approach can be the key to the next stage in your organisation’s evolution.

About the author

Szymon Klarman is the Chief Knowledge Architect at BlackSwan Technologies, where he oversees the adoption and development of advanced knowledge-based technology and modern data management concepts.