If you look at any indicator of what the data management world is interested in right now, blockchain will be at the top of the list, writes Ron Ballard FBCS, database consultant and author.

Internet searches, pronouncements by CEOs, new books published, information technology magazines, podcasts, major software companies, industry analysts, etc. All seem to be in a blockchain frenzy...

How does blockchain work?

First of all, let’s take a look at how blockchains work. You can read Satoshi Nakamoto’s original paper for much more detail; this is a quick summary.

Bitcoin is what made blockchain famous. It is not the only blockchain application, but it is the model for the others; it is by far the biggest, so you can assume Bitcoin in this description unless we say otherwise.

Blockchain combines two mature technologies: the chain and the hash.

Simple chain structure

The chain structure has been used for at least 50 years in data management software. The simple chain shown here is not ‘blockchain’, it is just a chain of blocks.

A simple chain

Each block is linked to the previous one by having a pointer value that matches the block ID of the previous block. If we want to change a simple chain, we can: we just add the new block and adjust the pointers.

But blockchain wants to make the chain ‘immutable’ so that once a block has been added to the chain it cannot be changed. Blockchain achieves this by making it very expensive to change the chain, as we shall see.

Hashing

The hash is also a very old computer software concept, going back at least to 1953.

A hash is a value that is calculated by applying some function to the series of bytes that make up a field, a block or even a whole file. The hash value is usually a number. Using a particular hash function the same input will always give the same output.

Let's start with a very simple example:

We could take the UTF-8 value of each character in the input, add them all up, divide the result by 256 and take the remainder as our hash. This is what we get for a few different strings:

Input UTF-8 values Sum of UTF-8 value Remainder (our hash)
Ron 82, 111, 110 303 47
Relational databases for agile developers 82, 101, 108, 97, 116, 105, 111, 110, 97, 108, 32, 68, 97, 116, 97, 98, 97, 115, 101, 115, 32, 70, 111, 114, 32, 65, 103, 105, 108, 101, 32, 68, 101, 118, 101, 108, 111, 112, 101, 114, 115 3893 53
Netezza 78, 101, 116, 101, 122, 122, 97 737 225
NonStop SQL 78, 111, 110, 83, 116, 111, 112, 32, 83, 81, 76 993 225
 

This hash function is very simplistic and can produce only 256 values, so we do get some collisions, as in the last two rows of this table.

The hash function used in blockchains is usually SHA-256. In this case ‘256’ refers to the number of bits in the hash, so we get 2256 or 1077 possible values and the chance of a collision is unimaginably small.

A blockchain

Now we can combine the simple chain and the hash to make a blockchain.

This example is simplified but it still shows the main feature that makes blockchains ‘immutable’.

The first step is to make our block ID ‘A’ the ‘genesis block’. We make a SHA-256 hash of the contents of the block:

‘Block ID|A|Transaction|001|Transaction|002|Transaction|003|Pointer||’

Which gives us the SHA-256 hash:
dfdba2bdb97e68127d8175ea200be502c192f94cd251e5a5024aed96fb72874e

We now use this as the identifier of the block.

We use the block hash of the genesis block as the pointer in the next block (our block ID ‘Q’).

So now we make a SHA-256 of block ID ‘Q’ including the pointer to the genesis block:

‘Block ID|Q|Transaction|004|Transaction|005|Pointer
|dfdba2bdb97e68127d8175ea200be502c192f94cd251e5a5024aed96fb72874e|’

Which gives us the SHA-256 hash:
3c4374099d09d10d36545c5bf10db1eb2dbe36b936312b95ce9803c923d82c60

We use the hash of block ID ‘Q’ as the pointer in block ID ‘R’ and include the pointer to block ‘Q’ in the block hash of block ‘R’. We continue up the chain like this:

A Blockchain

If we change just one byte in the data of block ID ‘Q’, then the hash of block ID ‘Q’ will have to change to make this block valid.

Now block ID ‘Q’ has a different block hash. So, to make block ID ‘R’ valid, we have to change its pointer so that it points to our new version of block ID ‘Q’. We have to calculate a new block hash for block ID ‘R’, which means we have to change the pointer in block ID ‘Z’ and so on until the end of the chain.

So you can change a blockchain, but if you do, then you have to change every block that follows the one that you want to change and that costs as much as creating all the blocks in the first place.

This would be very expensive, but since a significant selling point of blockchain is that it is immutable, there are some other requirements that make it even more difficult to change. These include ‘proof of work’ and a ‘peer-to-peer network’ to validate the blockchain.

The ‘proof of work’ is an arbitrary calculation that is done for every block that is added. The calculation is what produces the block hash and it is more complicated than the example shown above. In fact, thousands of hash calculations are required to produce the block hash. Bitcoin adds other complications to the ‘proof-of-work’ so that it really is very expensive to carry it out. The first peer to do this successfully gets to create the block and claim the reward.