The relational database has grown up to become the ubiquitous model for database management. But the computing world in which it resides has been completely transformed in those forty years; isn’t it time to seek new ideas, to expand the variety of data storage and modelling approaches to suit the internet-based environment? Shouldn’t we been looking for a new generation of management to suit the age of global and social computing?
The relational model was a response to the problems of data management in the 1970s. Database models such as the hierarchical were tied to technical implementations. The structuring of the data models was connected to the physical structure of the data, and was too complex for non-experts to use.
The relational model offered mathematical foundations in relational algebra and independence of implementation. It offered a model based on the concept of tables which was easy for users to understand and supported by structured query language (SQL).
This flexibility challenged the brittle hierarchical and network approaches despite initial concerns about the performance of relational systems in high applications with high transaction rates.
But it is just these solutions to the problems which create new problems today. Our practice of database design and development is dominated by structures and assumptions fixed in a stellar array of textbooks. Attributes, tables, views, persistence, concurrency remain unchanged, fixed in the academic landscape.
However, the concepts of relational databases and their implementation derived from a historical context which has largely been superseded. Multiple non-standard operating environments, hardware structures and file structures have been replaced by globally standard infrastructures.
Command line input into dumb terminals seems prehistoric compared with graphical and image-based interfaces. And yet business data is still tabular, SQL the lingua franca and interfaces form-based.
Indeed, data structures jemmied into a relational form and disguised as objected-oriented approaches provide only a veneer hiding the rigid relational database behind a smokescreen of new data types, encapsulation and inheritance.
Do we need attributes? Why are we still writing SQL? Why do we still filling reams of electronic forms? There should be no hiding place for the shibboleths of database implementation.
The nature of the applications which databases supported is changing. Data is more heterogeneous in form and distribution. Social computing creates data structures that are bottom-up, emergent and socially constructed rather than institutionalised.
Hence it’s time to consider why particular concepts are adopted and held to be right, and to consider whether alternatives may exist. An environment in which many data models are explored and developed in response to a diversity of requirements may provide the nursery for new approaches which may offer new directions on data modelling, representation and implementation. Nothing should be assumed or taken-for-granted.
A focus on the relational model which acts as a mathematical straightjacket; an emphasis on persistence, drawn from the past paucity of memory, and the constraints demanded by concurrency all contribute to our living in the past in a golden age of database research which does not fit with the dynamic nature of a 21st century computing environment.
The development of new approaches to data design requires us not just to address modularity, configurability, malleability, granularity and flexibility. It requires us to move from central control to distributed autonomy. New data modelling approaches may be bottom-up.
Understanding of data structure should emerge from the communities that need them. Relationship may be developed by proximity. Information entanglement occurs at the grassroots level as connections made locally are preserved and propagated globally.
New database models may behave very differently to the established relational database model. Emerging structure may result from the tasks carried out, from the services delivered. Database structuring should be user-centred, emergent, self-organising.
Back to the future and back to nature
If new approaches to databases are to be developed which challenge convention, then different solution spaces need to be visited. Two areas stand out. Firstly database history should be revisited; secondly nature should be observed.
In making new discoveries it is wise to consult history. History before relational database was broad and varied. In the mid 1970s, IBM’s Norman Winterbottom designed a network database model which was implemented and applied to some business applications.
Geoff Sharman developed a line-base approach to populating and querying the database called update-by-dialog. The update-by-dialogue model focused on both the needs of the user and the semantic representation of information. The database was structured to support a simple lined based dialogue (as demanded by the IBM 3270 dumb terminal interfaces of the time).
It was concise, understandable and easy to use. The database structure consisted of object set and mappings that were represented in graphs. Nodes represented object sets and arcs mapping to attributes of those objects and relationships. The update-by-dialogue approach married the need for interfaces, which were comprehensible to the user, with the need for data integrity and consistency.
This was one of many graph-based approaches which emerged, but were then ignored. Graph-based approaches can used for classical database applications and applications involving complex networked information which relational databases can’t cope with.
It’s time to renew studies of these models, particularly with reference to the development of self-organising approaches to data modelling and database management. The mathematical basis of these models is attractive, although the temptation to force graph-based models to look like relational models should be resisted. Graph-based models may be a catalyst for different way of thinking.
Hierarchical data models restricted data modelling to the parent/child relationship. Network databases allowed easier modelling of more than one parent per child. But both lacked implementation independence. A revisiting of such models may reveal forgotten features which could influence future approaches to data modelling.
Hierarchical databases may have lacked flexibility, but their structure was robust enough to support many major transactions systems until the efficiency of relational databases such as DB 2 for high volume transaction systems was proved. The large range of abandoned approaches to data modelling, which fell to the wayside in the face of the relentless march of the relational, need revisiting. Who knows what gems of ideas are lying abandoned?
As well as looking back in history, the pursuit of new ideas must involve drawing on other disciplines. Ideas about information storage in the natural sciences may yield new data modelling approaches. For example, quantum physics is very information-oriented.
Considering the ideas of quantum theory suggest a focus on identity and relationship in developing fine granularity data models, ideas such as superposition, entanglement and observer effect may have something to contribute to data modelling. Data structures could be envisaged which focus on connections and encode the strength of those connections.
Information storage and retrieval is a key function of biological systems. But the structure and function of the information stored is different from computer systems and much more diverse. Not all information in an organism is stored in the DNA. And only a fraction of the DNA contains information about proteins.
Thus information in the genome is about relationship, connection and dynamic interaction more than static physical structure. An understanding of how nature stores and uses information may lead to entirely new paradigms of data management.
The time is ripe for the development of entirely new ideas in data modelling and management. The technical environment needs new approaches as does the commercial and application environment. But to produce new approaches we must break current modes of thinking and question every aspect of database development.
We must examine the foundations, revisit the past and pursue new vistas by looking at disciplines and frameworks which may be completely outside the forms and models with which Codd and Date worked.
After forty years at work many would just retire. There is no sign of the relational database retiring. Neither is there much indication that the new database managers are being prepared for the future.