Graph data science has an unexpectedly long history as an effective means of organising complex information: first specified in the 1700s by maths genius Leonhard Euler, it recently came to prominence when Google used a graph-based page rank approach to revolutionise search.
Graph technology is now no longer an approach only companies such as the web pioneers have the internal expertise and resources to use. In the past, working with large volumes of connected data was solely the province of leading-edge firms with highly trained R&D teams. Today, this powerful and innovative technique, which can discover data connections through unique algorithms and embeddings, is available to any organisation serious about extracting value from data.
Graph-based data work is fast becoming mainstream in business. Graph data science as a central part of the enterprise data scientist’s toolbox will be a prominent feature of the next decade. In its June Top 10 Data and Analytics Technology Trends for 2020 report, IT analyst group Gartner confirms that “Finding relationships in combinations of diverse data, using graph techniques at scale, will form the foundation of modern data and analytics.”
Gartner has also polled companies about their use of AI and machine learning techniques. A remarkably high 92% said they plan to employ graph technology within five years. There is also an increase in academic research focused on this field, with over 28,000 peer-reviewed scientific papers about graph-powered data science published in recent years.
Leveraging connections for more accurate and interpretable predictions
The pace of graph data science adoption for business is accelerating. Graph data science is a powerful and innovative technique that can reason about the ‘shape’ of the connected context for each piece of data through graph algorithms.
Why do developers want this? Graph data science enables far superior and richer machine learning predictions. Graph data science is revolutionising how enterprises make predictions in diverse scenarios, from fraud detection to tracking a customer or a patient journey, by leveraging the connections between data points for more accurate and interpretable predictions.
In a drug discovery use case, this means identifying possible new associations between genes, diseases, drugs and proteins, while providing immediate context to assess the relevance or validity of any such discovery. For customer recommendations, it means learning from user journeys to make accurate recommendations for future purchases and presenting options from previous buying history to build confidence in suggestions.
The ability to rapidly ‘learn’ generalised, predictive features from data will take organisations to the next level with machine learning. While some teams are still learning how to leverage connected data in their existing machine learning workflows, the number of real-world examples is rapidly growing.
Graph technology adopters are finding that, from queries to support domain experts in uncovering patterns to identifying high-value features to train machine learning models, their best work is being unlocked with graph technology.
Emerging graph tech success stories
Let’s look at some examples of these trends. Graph data science is being used at the centre of British government, where data scientists are deploying their first machine learning model built with the help of graph technology. The resulting system automatically recommends content to users from the central government online resource, GOV.UK, based upon the page they are visiting. The application learns continuous feature representations for the nodes, which can then be used for various machine learning tasks, such as recommending content.
The government data scientists noted, ‘Through this process, we learned that creating the necessary data infrastructure which underpins the training and deployment of a model is the most time-consuming part’. In another part of the graph database ecosystem, a senior data scientist at leading media and marketing services company, Meredith, reports that the use of graph algorithms is allowing the transformation of billions of page views into millions of pseudonymous identifiers with rich browsing profiles: ‘Providing relevant content to online users, even those who don’t authenticate, is essential to our business... Instead of “advertising in the dark” we now better understand our customers, which translates into significant revenue gains and better-served consumers.’
Supporting supply chains
Graph data science is also supporting the medical supply chain. Global medical device manufacturer Boston Scientific is using graph data science to identify the causes of product failures. In its case, multiple teams, often in different countries, work on the same problems in parallel, but its engineers were having to resort to analysing their data in spreadsheets. This led to inconsistencies and difficulty finding the root causes of defects. Boston Scientific says a switch to graph technology has delivered a more effective method for analysing, coordinating and improving its manufacturing processes across all its locations.
Enhancing search times
Now users can conduct meaningful, data science-enhanced searches. Analytical query times have dropped from two minutes to 10-55 seconds, a boost that helps increase overall efficiency and streamlines the entire analytical process. The company can identify specific components that are more likely to fail. Another benefit is that because the graph data model is so simple, it’s easy to communicate to others.
‘Everyone involved with the project, from business stakeholders to technical implementers, is able to understand one another because they’re all speaking a common language,’ confirms Eric Wespi, Data Scientist at the company. The organisation generates even more business value by using natural language processing to analyse raw text detailing inspection failures, extracting and correlating topics for investigation into root causes of failures.
Graph data science is making natural language processing of a large-scale repository of technical documents detailing repairs more effective at international manufacturing leader Caterpillar. Acknowledging that valuable data was captured but inaccessible in more than 27 million documents, it set about creating a processing tool to uncover these unseen connections and trends.
The resulting graph-based machine learning classification tool learns from the portion of data already tagged with terms such as ‘cause’ or ‘complaint’ to apply to the rest of the data. It parses the text on its own and quickly finds patterns and connections, builds hierarchies and adds ontologies.
Another example of graph data science in action is in the healthcare sector. New York-Presbyterian Hospital's analytics team uses graph technology to track infections and take strategic action to contain them.
Its developers found that graph data science offered them a flexible way to connect all the dimensions of an event - the “what, when and where” the event occurred. Empowered with this insight, the team created a “time and space” tree to model all the rooms patients could be treated in on-site.
This initial model revealed a large number of inter-relationships, but that alone did not meet project goals. An event entity was included to connect the time and location trees. The resulting data model means the analytics team is able to analyse everything that happens in its facilities and proactively identify and contain diseases before they spread.
Here to stay
Its indisputable: graph-enabled data science is set to become a key part of business analytics, delivering beneficial business insights, in 2021 and beyond. Gartner’s data industry team predicts that a quarter of global Fortune 1000 companies will be leveraging graph technologies as part of advanced data and analytics initiatives within three years. Graph data science has definitely moved out of the 1700s and into business.
About the author
Alicia Frame is currently the Lead Product Manager and Data Scientist at Neo4j, where she works on the company’s Product Management team to set the roadmap and strategy for developing graph-based machine learning tools.