So... That software you are currently using: what do you know about it? Writes Brian Gooch, an Independent Database Engineer, specialising in designing and creating structured computer software systems.

Tell me about your software: is it easy to follow? Easy to design? Easy to construct? And easy to modify? In other words, do you know the nature of the structure of the program and the data it manipulates? Okay... so here’s an easier question: the building you are in, do you know the type of construction used?

It could comprise bonded brickwork, blockwork, or crosswall construction; or, if it is a multi-storey block, it will most likely comprise a framed structure, either in reinforced concrete or a steel-frame. You may not know exactly which type, but that does not matter to most people, as long as the building is structurally stable and does not fall down around their ears. The construction will have complied with proven engineering principles and other building regulations.

However, with software it is not easy to answer the question. Its structure is not readily visible, and yes, sometimes it does fall down. Then, when failure does occur, how does the investigator know what type of structure they are dealing with? Unless the software and data have been designed and constructed by a firm's known proprietary system, there is probably very little, if anything, to assist the investigator.

Background

So, if the software comes from a proprietary system, then there will be those practitioners who have been trained in it; they will have become specialists in what is probably a narrow field. If the user finds anything wrong, then it is back to the proprietor and the specialist to resolve the problem.

Let us now consider the software system, which is non-proprietary. Apart from the team who design and create the system, how does another engineer / practitioner know what type of program and data structure has been used? Generally, with documentation being a rarity, it seems only by adopting a reverse engineering approach in examining the programs and their actions, and the associated data involvement. This is a time-consuming task at the best of times, even with the creators' code available. Assuming the source of the problem has been identified, there remains the question of how it will be corrected.

Current approaches

Briefly, there are a few well-known approaches to designing and creating the system software and its concomitant data facilities. Here are some: agile methodology, dynamic systems development method, rapid application development, unified modelling language, Vienna design method... Ignoring the arguments about waterfall v agile, which of these (or any other approaches) actually specifies a structure relevant to the program and data? The data could be stored within the program or in separate files and may possibly be arranged as flat files, hierarchically, networked, or relationally. Here the point is illustrative, not exhaustive. There may also be the programming language used and the operating system involved.

What this is leading to is that, with all these tools, approaches and languages available, there appears to be no clear indication of the type of program and data structure. Referring back to our building at the beginning, it is clear that the type of structure is decided at the outset before any work starts on the design and construction. The structure is a fundamental pre-requisite. This, however, does not seem to be true in the field of computing, in the design and construction of programs and data. So, here the question arises: if you don’t know what the structure is before you start, how do you know whether the design and construction will provide you with the facilities and results you want?

Program structure

Has the original question been answered? No, far from it. There are two distinct aspects to deal with: the program structure and the data structure. Let us take the program first; this is the part of the software which everybody uses to access the facilities and the data, be it a graphical user interface or otherwise.

Borrowing an approach from the electronics industry, you can construct programs as if designing a simple circuit board with one entry point and one exit point. You can envisage this as the power plug being attached to the board where the positive pin is the entry point and the negative pin the exit. The current flows through the circuit and passes through components, each of which has only one entry point and one exit, with the current eventually arriving at the negative exit.

In computing terms, this means that the main program starts at the positive pin and has a row of components finishing at the exit pin. Each component is a routine with one entry point and one exit point. Additional sub-circuitry emanates from each of these components (routines) and invokes sub-components (sub-routines) each of which have only one entry point and one exit. The return from the last sub-routine in each sub-circuit is back to the main program. There is no cutting corners by whipping across from one sub-circuit to another - always back to the main program first. The flow from one component (routine) to another is directed by the user; this is the stable simple circuit structure.

Data structure

The program accesses the existing stored data or adds more: i.e. a database. Over the years, as a result of my expert witness investigations, it seems appropriate to use relational databases for storing and manipulating the data, mainly owing to their already having strict rules arising from the two proven branches of mathematics of set theory and relational algebra and, in addition, being constrained by E. F. Codd's rules and normalisation.

Data represents information; if that data does not have 100% integrity, it could be worse than useless. Thus, the relational database must be 100% reliable at all times. Data is represented by identifiable collections of data, each of which is called a relation. Relations are linked to other relations via relationships, whereby the unique identifier (the primary key) of one relation is linked to an attribute in another relation (the foreign key). This <primary key / foreign key> mechanism forms the basis of the relational database structure. It is also imperative that, in order to maintain this structure, the foreign key can never be null.

Just as we need a three-dimensional x y z co-ordinate system for locating an aircraft in flight, we need a simple form of fixing the location of a relation within the database structure: ie. triangulation. Thus, every relation must be fixed such that it sits in a structure with three axes. To achieve this, we turn to structural engineering and divide the relations into four types:

  • Anchors - which represent the real world, e.g. people, products, money;
  • Tasks - which represent an activity, e.g. sales, invoices, purchase orders, payments;
  • Intersects - which are one-to-many relationships created by resolving many-to-many relationships;
  • Lookups - which add descriptive details, e.g. types, kinds, statuses, groups, ranges, colours, sizes.

Every relationship is a link with two ends and each of the tasks and intersects must have three connections with anchors, tasks or intersects, such that the number of ends of the connections N equals three times the number of relations r, ie. the rule is N = 3r. This results in a three-dimensional structure comprising tetrahedra: a statically determinate structure in structural engineering. To extend the database, just add relations and links complying with the above rule. There is only a real-world limit on the number of connections from lookups.

Conclusion

The foregoing provides a software system whereby the program and data are designed and created using the simple circuit statically determinate structure. It is easy to follow, easy to design, easy to construct and, it is easy to modify.