Our social, political, and economic infrastructure is almost entirely dependent on microchip driven electronic devices. However, this all-pervasiveness and our dependence on them also bring serious challenges according to Abusaleh Jabir from Advanced Reliable Computer Systems (ARCoS), Oxford Brookes University and Eugene Sweeney, MBCS CITP, Iambic Innovation.

Whilst cyber security has received much attention at the software level, driven by the growth of the now ubiquitous internet, there is now an increasing threat of attacks to the underlying hardware infrastructure.

There is a growing use of hardware based cryptography for on-the-fly encryption, since the required (and growing) throughput is not achievable using software (e.g. in wearable and mobile electronic devices, network switches and routers, smart cards, etc.). Owing to the potentially devastating nature of attacks to infrastructure, designing attack tolerant hardware has now become a priority.

Modern digital circuits can be large and complex, and detection and/or correction of errors in such circuits, resulting from manufacturing defects, radiation induced naturally occurring faults, and deliberate attacks, are becoming increasingly important.

Although many approaches to designing error/fault/attack tolerant hardware have been proposed, most of them are inappropriate for practical applicability due to various design drawbacks, particularly as circuits become more complex, and size, power and delay more important.

Radiation induced faults, such as single event upsets (SEU) and multiple event upsets (MEU) in memories are well researched; however, as a result of the technology scaling, the logic blocks are also vulnerable to malfunctioning when they are deployed in radiation prone environments or subject to deliberate malicious radiation attacks. Deliberate industry scale attacks on electronic hardware to gain information about its IP and other secret information is well know, e.g. the case of counterfeit Cisco routers [5].

These attacks usually exploit radiation induced weaknesses in hardware or other side channels resulting from the requirements of testing chips, dissipated power signatures, and also maliciously implanted hardware Trojan’s or time bombs, i.e. a small and difficult to detect hardware maliciously added to the original hardware to leak out secret information or cause a chip to fail [4,6].

A new approach to designing and testing attack tolerant digital circuits has been developed in the Advanced Reliable Computer Systems Group (ARCoS) at Oxford Brookes University, and a patent application has been filed.

Whilst a major application is for targeted systems, such as in cyber security and for the protection of critical infrastructures, the technique is also applicable for systems that can be prone to manufacturing defects for enhancing the yield or in situations where the systems can be subjected to higher levels of radiation.

For example, for applications where access is difficult or the environment severe, such as in space, undersea, in remote locations, or those in extreme conditions, error tolerant systems are needed.

As components continue to reduce in size, manufacturing techniques themselves pose challenges; for example, manufacturing digital circuits for nanotechnology based systems (e.g. with carbon nanotubes) are prone to manufacturing faults. Digital circuits therefore need to be fault tolerant. All these areas present new high growth opportunities for digital circuit designers.

The benefits compared to current approaches, such as hamming or triple modular redundancy (TMR) for faults are:

  • The resulting circuits are dynamic in nature, only kicking in when an error is detected (so if no error, then no delay overhead)
  • less hardware and power intensive (so smaller and lower cost)
  • able to do multiple error correction (so able to address more practical applications)
  • allows hardware to be customised, based on level of error tolerance required (i.e. number of errors to correct)
  • The entire circuit is protected, unlike TMR (or N-Modular redundancy), where the voter remains susceptible to deliberate attacks, faults or errors.

The new techniques have been validated using digital circuits that perform finite field arithmetic, which is used in numerous applications including cryptography.

The results from using the technique to design dynamically error correctable bit parallel Montgomery multipliers over GF(2m) were published in 2011[1,2]. The analysis and experimental results show the comparison of key factors such as area overhead, power and delay of the designs. They show that the new approach has a lower complexity in terms of area and delay compared with the TMR based approach.

Further work applied these novel techniques to the design of radiation hardened and attack tolerant systems over finite fields; i.e. for correcting multiple errors which makes the circuits robust against radiation induced faults irrespective of the location of the errors.

These designs also incorporated a dynamically error correctable architecture, which reduced the critical path delay penalty by up to 50 per cent in the absence of any errors. This contributed to significant performance enhancement in the absence of any errors, and the designs were able to tackle errors occurring both in the functional block as well as in the redundant bit generation blocks.

ASIC prototyping and silicon layout designs of the proposed architectures were done in 180nm and 90nm CMOS technology. The experimental results show that this new approach has a lower complexity in terms of area and delay and power compared with the TMR based techniques and better error correction capability as compared to other existing well known techniques such as hamming and LDPC, with comparable area overheads, e.g. the area complexity for 3-bit correction in a 45-bit multiplier is only 150 per cent as compared to 200 per cent of that of TMR, well within acceptable margins for hardware overhead, but with the benefit of enhanced capability.

A further element of the new technique is the incorporation of a new cross-parity technique. This solution has been presented in [3] with a much simplified decoder design. This radical design has significantly lower area and power overhead, but is capable of correcting errors in certain combinations.

Whilst the first two designs can correct errors in any combination, but with a complex decoder design, the cross-parity approach can correct errors in certain combinations with a much simplified decoder designs. Theoretical and simulation based analyses has shown that the error coverage with cross-parity can be much wider than the other approaches with comparable overhead.

These techniques have been tested with large systems, e.g. 80-bit parallel multipliers, and 163-bit FIPS/NIST standard digit serial GF multipliers for cryptographic applications.

The suitability of the various approaches will depend on the target application, i.e. when a certain number of errors in all possible combinations require correcting, the first two approaches are a better choice, whereas, when much larger number of error conditions require correcting (albeit not in all combinations, especially when hardware overhead is the major constraint), the thirds approach using cross-parity may offer a better alternative.

The proposed techniques can be used on their own or in conjunction with other techniques for providing enhanced assurance of security and reliability. For example, despite more than 200 per cent overhead, the reliability of the TMR (or N-modular redundancy with even higher overheads) depends largely on the reliability of the voter, which is prone to failures. The proposed techniques can be used in conjunction with the voter to provide with enhanced assurance of reliability.

The developers are now interested in talking to industry, with a view to further validating the technique, and to potential cryptographic hardware designers to help bring the technology to market.