Graph databases: The journey from flat earth to metaverse

Databases help us make sense of a complicated world. Chandra Rangan Chief Marketing Officer at Neo4j takes Johanna Hamilton AMBCS away from the flat earth of legacy databases and into the metaverse.

Moore’s Law states that the capability and speed of a computer is constantly increasing, so it’s perhaps no surprise that databases are due not just an evolution but a revolution in storage and use.

The first computerised database was created by Charles Bachman back in the early 1960s. From then the evolution went through relational databases; SQL in the 1980s to NoSQL in the noughties. Taking another leap forward, the advent of cloud has created an eyepopping amount of data that now has to be stored and in doing so placed new demands on how information is catalogued, searched and stored. Gartner predicts that ‘70% of Organisations Will Shift Their Focus From Big to Small and Wide Data By 2025.’ The challenge is how will companies deal with the sheer volume of data and the agility needed to extract information?

A new era: The graph database

A major contender here is graph technology, which is a market led by Neo4j. The ethos behind Neo4j’s database creation was to change how the database was mapped, searched and to facilitate finding hidden connections data scientists couldn’t readily see. In short, the graph database management system is all about relationships. Chandra Rangan, Chief Marketing Officer at Neo4j, explains: ‘Intuitively, we think of the world in terms of entities and relationships. And the relationships are extraordinarily important. So, when you take entities and relationships you actually draw what is called an entity relationship diagram. In school, you’re taught how to put this relationship into a table. However, Neo4j takes the intuitive representation of the world that we have, and represents it the same intuitive way – which means relationships are not an afterthought, they are first-class citizens in how they are stored.’

So, when we think of storage, we often think of tables with joins. Each piece of information has a join to the next and to the next. The recording of data feels very linear. Rangan continues: ‘So, relational databases have rows and columns – Oracle is a great example, it's a phenomenal database. You design the database and you have to decide in advance what the data model is. For example, you might want to store Person X’s data, their name, their city, their address... Or I can break that into two tables and say Person X lives in the city, and then there's a separate table for a city that has characters of the city. So now if I need to understand, who lives in a city that has more than a million people, that ends up being a complex thing to do.

‘In a graph database that complexity doesn't exist anymore, because there's a relationship between you and the city as a first-class citizen. So, I can actually look up cities that are bigger than a million and then almost instantly it points me to all the residents in those cities. So, for a whole set of queries, which is showing me all people, all folks, all entities, all nodes that have these characteristics – it's very, very fast – not because we are faster but because we are architected and designed differently.

‘If you build a graph database and suddenly there's a new question – one you didn’t design the database for – then you have to redesign the whole thing. In graph databases we offer a flexible schema or a no schema. What that means is you don't have to worry about the schema in advance because you have nodes and relationships. So, create the nodes, create relationships and if you have to add that’s fine. You can make that decision on the fly and you can change that on the fly as opposed to traditional databases where you would have to start again.’

Changing the language of databases

Putting the database together is just part of the conundrum – being able to query and get the right information out is obviously key. Part of the database revolution has been switching the recognised language of databases – SQL language, created in the 1980s – with a simpler, more intuitive language called Cypher. Not only has this simplified the way the database is questioned, but it has also democratised the use enabling the ‘citizen data-scientist’ to step up.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Rangan continues: ‘The language that we developed to query the data, it is one of the most intuitive languages around. It's called cypher and everyone at Neo4j understands it, whether they’re in IT, marketing, HR or sales. We have a visual browser that makes it super intuitive to use.

‘When it comes to data science and AI/ML that's a whole new area which is super interesting because unlike in databases where you have DBAs and developers, now data scientists are developers, who build AI/ML models. The exciting thing is that this is accessible to a majority of data scientists that haven’t trained as data scientists.’

Taking the science out of data scientist

A simplified language, a simplified way of connecting the information together is making the role of the data scientist not only easier, but more accessible. It’s taking the need to be a true scientist out of the data analysis role. Neo4j’s graph databases are also designed to work with the most common demands of the user. Rangan continues: ‘We built in 60+ ML algorithms into the product so the common questions that you may want to ask, are translated with those algorithms to develop the model we have.’

Maximising the database through ML

While we welcome innovation, is there a fear that using machine learning to manipulate data may amplify any bias in the machine? Should we be worried about “inference” in collating the data?

Rangan continues: ‘When you load data on the database for the first time there could be gaps, especially when you take table-based data where relationships are inferred and then you try to put those relationships in a graph database, then there is a chance for, “hey, do I really know the relationship? Should I infer something about it?” So, there's a trade-off between, “do I make sure the loading is perfect” versus “it's not”.

‘The moment we hit a gap in the data we call it a ‘hanging relationship’ and we will flag that relationship back to the user and say “how do you want to deal with it?” We avoid inferring a relationship because our point of view is very straightforward, an inferred relationship with low data is corrupt data.’

A tool for data and forecasting

While there may be reassurance that the data isn’t invented or misinterpreted, there is room for hypothesis within the set up and for the database not just to be a cataloguing tool, but also the basis for an efficient forecasting tool. Rangan explains: ‘The thing to understand is the graph database itself does not create synthetic data and we do not do best guesses – that is a technique that is used by the developer or the data scientist to create a hypothesis.

‘If you look at digital twinning, it’s an alternative way to model outcomes. If you have a digital model of the physical network, you can take the digital model and do "what if" analyses. So, you can explore what if the data looked like this? Which is really “what if the world looked like this? What could happen?” So, it gives you predictive power, based on “what if” scenarios. But it’s entirely up to the practitioner what those “what-if" scenarios are, and therefore if those scenarios happen how could the network interact? Will it break down? At what point does it break down? How can I fix it and how can I plan for it? So, it becomes a very powerful planning tool.’

Is there space for Neo4j to jump into the metaverse?

The graph database seems to live in a 3D space – is there a natural progression for Neo4j to move into the metaverse? And what might that evolution look like? Rangan thinks: ‘It could underpin the types of things that you could do in a metaverse. We have a customer in Asia who actually uses VR glasses to do visualisation and data exploration. So that is definitely a fascinating direction to take. It presents a whole bunch of possibilities, again back to the notion of intuitive mental models that we have about the world and how we are able to represent it.

For Rangan, the future of this Moore’s Law style database evolution is very bright—especially in the world of AI. ‘Gartner predicts that in three- or four-years, 80% of ML models will be based on graph technology. And the reason is a deep neural learning network is fundamentally a graph. But rather than having table based data that gets translated into a graph, graph databases are natively graph, so it’s a much faster process.’