#### Underpinning mainstream engineering with mathematical semantics

#### Peter Sewell University of Cambridge

LMS/BCS-FACS Evening Seminar

18 November 2021

This work was partially supported by the UK Government Industrial Strategy Challenge Fund (ISCF) under the Digital Security by Design (DSbD) Programme, to deliver a DSbDtech enabled digital platform (grant 105694). ERC AdG 790108 ELVER, EPSRC [DAGK/0608524] REMS, Arm (CASE sawards, EPSRC IAA KTF funding), the Isaac Newton Trust, the UK Higher Education Innovation Fund (HEIF), Thales E-Security, Microsoft Research Cambridge, Arm Limited, Google, Google DeepMind, HP Enterprise, and the Gates Cambridge Trust. Approved for public release; distribution is unlimited. This work was supported by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-01-0-C0327 ("CTSRD"), FA8750-11-C-0249 ("MRC2"), HR0011-18-C-0016 ("ECATS"), and FA8650-18-C-7809 ("CTSR"), as part of the DARPA CRASH, MRC, and SSITH research programs. The views, opinions, and/or findings contained in this report are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

#### Mainstream computer engineering



massively successful

elegant stack of abstractions

Image by Saffron Blaze - Own work, CC BY-SA 3.0

#### ...but if we look straight



Image by Saffron Blaze - Own work, CC BY-SA 3.0

constant danger of collapse

huge problems from the legacy design choices

maintained only by constant security patching

# Foundations?

**Dynamic Analysis Formal Specification** Static Analysis Abstract Interpretation Model Checking Program Logics Concurrency Theory Proof Assistants SMT Process Calculi Type Systems Lambda Calculus Type Theory Denotational Semantics Computability Theory Operational Semantics <sup>5</sup> Domain TheoryComplexity Theory Category Theory Automata Theory **Discrete Mathematics** 

# Foundations?



Image by Saffron Blaze - Own work, CC BY-SA 3.0

# Foundations?



test-and-debug development

prose specifications (at best)

HTTPS, TCP/IP Linux, Windows JavaScript, Python, Java C, C++ Arm, x86

Image by Saffron Blaze - Own work, CC BY-SA 3.0





I don't mean to imply that our existing theory has *no* connection to practice – much does

But the fundamental abstractions that mainstream computing relies on, and the development processes most programmers use, remain mostly oblivious to the theory we have.

And, to a significant extent, vice versa

#### Main Question

How can we develop mathematical semantics for the actual mainstream artifacts we rely on (not just idealised or toy versions, or semantics just about how we think the world ought to be)

and apply it to improve the mainstream engineering of them?

Will look at this through the lens of several inter-related projects:

- network protocols
- relaxed-memory concurrency
- ► instruction-set architecture
- ► C
- CHERI

# Part 1: NetSem – semantics of TCP and Sockets API

#### NetSem: Semantics of network protocols - TCP/IP and the Sockets API

# Engineering with Logic: Rigorous Test-Oracle Specification and Validation for TCP/IP and the Sockets API

STEVE BISHOP, University of Cambridge<sup>1</sup> (<sup>1</sup> when this work was done), UK MATTHEW FAIRBAIRN, University of Cambridge<sup>1</sup>, UK HANNES MEHNERT, roburio, Center for the Cultivation of Technology, Germany MICHAEL NORRISH, Data61, CSIRO and Australian National University, Australia TOM RIDGE, University of Cambridge, UK PETER SEWELL, University of Cambridge, UK MICHAEL SMITH, University of Cambridge<sup>1</sup>, UK KEITH WANSBROUGH, University of Cambridge<sup>1</sup>, UK

Conventional computer engineering relies on test-and-debug development processes, with the behaviour of common interfaces described (at best) with prose specification documents. But prose specifications cannot be used in test-and-debug development in any automated way, and prose is a poor medium for expressing complex (and loose) specifications.

The TCP/IP protocols and Sockets API are a good example of this: they play a vital role in modern communication and computation, and interoperability between implementations is essential. But what exactly they are is surprisingly obscure: their original development focused on "rough consensus and running code", augmented by prose RFC specifications that do not precisely define what it means for an implementations, including, for example, the follow-2000 lines of the facto one of the common implementations, including, for example, the 15000–2000 lines of the BSD implementation — optimised and multithreaded C code, time-dependent, with asynchronous event handlers, intertwined with the operating system, and security-critical.

This paper reports on work done in the *Netsem* project to develop lightweight mathematically rigorous techniques that can be applied to such systems: to specify their behaviour precisely doubt be able to specify the second se

#### JACM 2018, FM 2008, ICNP 2006, POPL 2006, SIGCOMM 2005, SIGOPS EW 2002, ESOP 2002, TACS 2001

Motivation: existing protocol specifications (RFCs) are vague prose. Can we specify real protocols precisely with honest maths?

(Original: better failure semantics for process calculi)

#### NetSem: Semantics of network protocols - TCP/IP and the Sockets API

# Engineering with Logic: Rigorous Test-Oracle Specification and Validation for TCP/IP and the Sockets API

STEVE BISHOP, University of Cambridge<sup>1</sup> (<sup>4</sup> when this work was done), UK MATTHEW FAIRBAIRN, University of Cambridge<sup>1</sup>, UK HANNES MEHNERT, roburio, Center for the Cultivation of Technology, Germany MICHAEL NORRISH, Data61, CSIRO and Australian National University, Australia TOM RIDGE, University of Cambridge, UK MICHAEL SMITH, University of Cambridge<sup>1</sup>, UK KEITH WANSBROUGH, University of Cambridge<sup>1</sup>, UK

Conventional computer engineering relies on test-and-debug development processes, with the behaviour of common interfaces described (at best) with prose specification documents. But prose specifications cannot be used in test-and-debug development in any automated way, and prose is a poor medium for expressing complex (and loose) specifications.

The TCP/IP protocols and Sockets API are a good example of this: they play a vital role in modern communication and computation, and interpretability between implementations is essential. But what exactly they are is surprisingly obscure: their original development focused on "rough consensus and running code", augmented by prose RFC specifications that do not precisely define what it means for an implementations, including, for example, the actual standard is the de facto one of the common implementations, including, for example, the 15000–2000 lines of the BSD implementation — optimised and multithreaded C code, time-dependent, with asynchronous event handlers, intertwined with the operating system, and scentriy-critical.

This paper reports on work done in the *Netsem* project to develop lightweight mathematically rigorous techniques that can be applied to such systems: to specify their behaviour precisely doubter the such to such the such as the such

#### JACM 2018, FM 2008, ICNP 2006, POPL 2006, SIGCOMM 2005, SIGOPS EW 2002, ESOP 2002, TACS 2001

Success: defined in HOL4 an envelope of allowed behaviour for TCP/IP and Sockets, with clear and experimentally testable relationship to production impls.

...can cope with real artifacts, without idealisation or much restriction

...learned a lot about experimental semantics, testing, and working at scale

# NetSem: Approach

*Experimental* approach: validating semantics against existing implementations (the de facto standards)

Forms of semantics:

- paper supports hand proof
- mechanised
  - in a prover supports mechanised proof
  - executable can run as a (maybe slow) implementation
  - executable as a test oracle can decide whether some observable behaviour is allowed or not

# NetSem: Approach

*Experimental* approach: validating semantics against existing implementations (the de facto standards)

Forms of semantics:

- paper supports hand proof
- mechanised
  - in a prover supports mechanised proof
  - executable can run as a (maybe slow) implementation
  - executable as a test oracle can decide whether some observable behaviour is allowed or not

Need the last:

- to validate semantics against implementations (without full impl correctness proof)
- to test implementations against semantics to make directly usable for engineers in standard test-and-debug development

# NetSem: Approach

*Experimental* approach: validating semantics against existing implementations (the de facto standards)

Forms of semantics:

- paper supports hand proof
- mechanised
  - in a prover supports mechanised proof
  - executable can run as a (maybe slow) implementation
  - executable as a test oracle can decide whether some observable behaviour is allowed or not

Need the last:

- to validate semantics against implementations (without full impl correctness proof)
- to test implementations against semantics to make directly usable for engineers in standard test-and-debug development

Key technical challenge: achieving this in the face of specification looseness/nondeterminism For TCP/IP: did so with custom symbolic evaluation in HOL4 Part 1: NetSem - semantics of TCP and Sockets API

#### NetSem: Reflections

Working at scale: 9k LoS, 9 person-years. More like code than classic maths.

Have to deal both with

- "fundamental" systems complexity it's a subtle protocol, for good engineering reasons
- contingent complexity, from historical accidents and mistakes

To be useful, need both! Not just some idealisation enough for a single paper.

#### NetSem: Reflections

Failure? Stopped too soon:

...didn't produce turnkey system, usable by practitioners

...used fancy tools, making that hard (in hindsight, could be much simpler)

...didn't influence standards

...didn't get to point of proving substantial theorems about the protocol

...problem wasn't sufficiently widely appreciated?

# Part 2: Relaxed-memory concurrency

# Relaxed-memory concurrency

2007: Susmit Sarkar arrives as new postdoc in Cambridge

We want to work on hypervisor verification (Xen)

...but what's the underlying hardware programming model?



# Relaxed-memory concurrency

2007: Susmit Sarkar arrives as new postdoc in Cambridge

We want to work on hypervisor verification (Xen)

...but what's the underlying hardware programming model?

...impossible to tell



# What is relaxed-memory concurrency?

Naively, shared-memory concurrency has *sequentially consistent* semantics:

Threads are interleaved in some arbitrary sequential order, consistent with program order within each thread, with each read reading from the most recent write to the same location.



Most theory has assumed that.

### What is relaxed-memory concurrency?

In reality, mainstream architectures and programming languages give much weaker guarantees for low-level concurrent code. Consider e.g.





Observable (and allowed) on ARMv7, Armv8-A, IBM Power, and RISC-V (not on x86) So you can't reason in terms of a simple global-time model

# What is relaxed-memory concurrency?

In reality, mainstream architectures and programming languages give much weaker guarantees for low-level concurrent code. Consider e.g.





Observable (and allowed) on ARMv7, Armv8-A, IBM Power, and RISC-V (not on x86) So you can't reason in terms of a simple global-time model

Why? Observable effects of microarchitecture and compiler optimisations:

- pipeline: out-of-order and speculative execution
- storage subsystem: write propagation in either order
- compiler: common subexpression elimination

Approach #1: examine vendor architecture documentation of the time

Approach #1: examine vendor architecture documentation of the time

"Principle 5. Intel 64 memory ordering ensures transitive visibility of stores i.e. stores that are causally related appear to execute in an order consistent with the causal relation" [Intel White Paper, 2007]

Approach #1: examine vendor architecture documentation of the time

IBM Power barriers (ARMv7 text similar):

"For each applicable pair  $a_i, b_j$  of storage accesses such that  $a_i$  is in A and  $b_j$  is in B, the memory barrier ensures that  $a_i$  will be performed with respect to any processor or mechanism, to the extent required by the associated Memory Coherence Required attributes, before  $b_j$  is performed with respect to that processor or mechanism.

- A includes all applicable storage accesses by any such processor or mechanism that have been performed with respect to P1 before the memory barrier is created.
- B includes all applicable storage accesses by any such processor or mechanism that are performed after a Load instruction executed by that processor or mechanism has returned the value stored by a store that is in B."

Approach #1: examine vendor architecture documentation of the time

IBM Power barriers (ARMv7 text similar):



Approach #1: examine vendor architecture documentation of the time

presumes the documentation expresses a coherent model

#### Approach #2: look at previous research, 1978–2008

- L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache systems. IEEE Trans. Comput., 27(12):1112-1118, December 1978.
- . L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., C-28(9):690-691, 1979.
- · William W. Collier. Principles of architecture for systems of parallel processes. Technical Report TR 00.3100, IBM Poughkeepsie, 1981.
- . Michel Dubois, Christoph Scheurich, and Faye A. Briggs. Memory access buffering in multiprocessors. In Proc. ISCA '86, pages 434-442, 1986.
- J. Misra. Axioms for memory access in asynchronous hardware systems. ACM Trans. Program. Lang. Syst., 8(1):142-153, 1986.
- Dennis Shasha and Marc Snir. Efficient and correct execution of parallel programs that share memory. ACM Trans. Program. Lang. Syst., 10(2):282–312, April 1988.
- James R. Goodman. Cache consistency and sequential consistency. Technical Report Technical Report 61, IEEE Scalable Coherent Interface (SCI) Working Group, March 1989.
- Sarita V. Adve and Mark D. Hill. Weak ordering a new definition. In Proc. ISCA '90, pages 2-14. ACM, 1990.
- Kourosh Charachorloo, Daniel Lenoski, James Laudon, Phillip Gibbons, Anoop Gupta, and John Hennessy. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proc. ISCA '90, pages 15-26. ACM, 1990.
- · William W. Collier. Reasoning About Parallel Architectures. Prentice-Hall, Inc., 1992.
- Pradeep S. Sindhu, Jean-Marc Frailong, and Michel Cekleov. Formal Specification of Memory Models, pages 25–41. Springer US, 1992.
- · Prince Kohli, Gil Neiger, and Mustaque Ahamad. A characterization of scalable shared memories. In ICPP: International Conference on Parallel Processing, pages 332-335, 1993.
- . F. Corella, J. M. Stone, and C. M. Barton. A formal specification of the PowerPC shared memory architecture. Technical Report RC18638, IBM, 1993.
- David L Dill, Seungjoon Park, and Andreas G. Nowatzyk. Formal specification of abstract memory models. In Proceedings of the 1993 Symposium on Research on Integrated Systems, pages 38-52. MIT Press, 1993.
- The SPARC Architecture Manual, Version 9. SPARC Int., Inc., 1994.
- · Hagit Attiya and Roy Friedman. Programming DEC-Alpha based multiprocessors the easy way (extended abstract). In Proc. SPAA, pages 157-166, New York, NY, USA, 1994. ACM.
- José M. Bernabéu-Aubán and Vicente Cholvi-juan. Formalizing memory coherency models. Journal of Computing and Information, 1:653–672, 1994.
- . K. Gharachorloo. Memory consistency models for shared-memory multiprocessors. WRL Research Report, 95(9), 1995.
- . Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. Causal memory: definitions, implementation, and programming. Distributed Computing, 9(1):37-49, 1995.
- . Lisa Higham, Jalal Kawash, and Nathaly Verwaal. Weak memory consistency models. Part I: Definitions and comparisons. Technical report, Department of Computer Science, University of Calgary, 1998.
- Prosenjit Chatterjee and Ganesh Gopalakrishnan. Towards a formal model of shared memory consistency for Intel Itanium<sup>tm</sup>. In 19th International Conference on Computer Design (ICCD 2001), September 2001, Austin, TX, USA, pages 515–518, 2001.
- Intel. A formal specification of Intel Itanium processor family memory ordering, 2002. http://download.intel.com/design/Itanium/Downloads/25142901.pdf.
- A. Adir, H. Attiya, and G. Shurek. Information-flow models for shared memory with an application to the PowerPC architecture. IEEE Trans. Parallel Distrib. Syst., 14(5):502-515, 2003.
- Yue Yang, Ganesh Gopalarishnan, Gary Lindstrom, and Konrad Slind. Nemos: A framework for axiomatic and executable specifications of memory consistency models. In 18th International Parallel and Distributed Processing Symposium (IPDPS). Starts Fe. New Mexico, USA, 2004.
- Lias Higham, LillAnne Jackson, and Jalal Kawash. Programmer-centric conditions for Itanium memory consistency. In Proceedings of the 8th International Conference on Distributed Computing and Networking. ICDCN'06, pages 58-60. Springer-Verlag. 2006.
- Arvind Arvind and Jan-Willem Maessen. Memory model = instruction reordering + store atomicity. In Proc. ISCA '06, pages 29-40. IEEE Computer Society, 2006.
- N. Chong and S. Ishtiaq. Reasoning about the ARM weakly consistent memory model. In MSPC, 2008.

...and language-level work, especially around JMM and early work towards C/C++11 (Manson, Pugh, Boehm, Adve)

#### Part 2: Relaxed-memory concurrency

#### Approach #2: look at previous work, 1978–2008

| x86 (Intel, AMD) | ?            |                  |
|------------------|--------------|------------------|
| IBM Power        | ?            |                  |
| ARMv7            | ?            |                  |
| ARMv8            |              | didn't exist yet |
| RISC-V           |              | didn't exist yet |
| SPARC TSO        | $\checkmark$ | -                |
| SPARC RMO, PSO   | $\checkmark$ | not used?        |
| ltanium          | $\checkmark$ |                  |
| MIPS             | ?            |                  |
| Alpha            |              | gone             |

experiment (following TCP ideas)

- experiment (following TCP ideas)
  - build hardware test harness, litmus

(Maranget; early versions by Braibant and Zappa Nardelli, and by Sarkar)

- make models executable as test oracles in various tools
- hand-write and auto-generate libraries of interesting tests (auto-generation with diy, Alglave & Maranget)
- find various hardware bugs in production silicon along the way

- experiment (following TCP ideas)
  - build hardware test harness, litmus

(Maranget; early versions by Braibant and Zappa Nardelli, and by Sarkar)

- make models executable as test oracles in various tools
- hand-write and auto-generate libraries of interesting tests (auto-generation with diy, Alglave & Maranget)
- find various hardware bugs in production silicon along the way
- talk with vendors (mainly IBM + Arm) to find out what they intend
  - hard to find right people and get their time
  - presumes they know what they mean (for architectural envelope, not just what they build)
    - prose specs didn't give them the tools to think about that clearly
  - but their architectural intent is primary

- experiment (following TCP ideas)
  - build hardware test harness, litmus

(Maranget; early versions by Braibant and Zappa Nardelli, and by Sarkar)

- make models executable as test oracles in various tools
- hand-write and auto-generate libraries of interesting tests (auto-generation with diy, Alglave & Maranget)
- find various hardware bugs in production silicon along the way
- ▶ talk with vendors (mainly IBM + Arm) to find out what they intend
  - hard to find right people and get their time
  - presumes they know what they mean (for architectural envelope, not just what they build)
    - $-\ensuremath{\mathsf{prose}}$  specs didn't give them the tools to think about that clearly
  - but their architectural intent is primary
- iterate: compare models and hardware, relate to language models, prove things, discuss with vendors, work with RISC-V TG, refine models
## Approach #3

- experiment (following TCP ideas)
  - build hardware test harness, litmus

(Maranget; early versions by Braibant and Zappa Nardelli, and by Sarkar)

- make models executable as test oracles in various tools
- hand-write and auto-generate libraries of interesting tests (auto-generation with diy, Alglave & Maranget)
- find various hardware bugs in production silicon along the way
- ▶ talk with vendors (mainly IBM + Arm) to find out what they intend
  - hard to find right people and get their time
  - presumes they know what they mean (for architectural envelope, not just what they build)

37

- prose specs didn't give them the tools to think about that clearly
- but their architectural intent is primary
- iterate: compare models and hardware, relate to language models, prove things, discuss with vendors, work with RISC-V TG, refine models

hence (working with IBM, Arm, RISC-V) create precise architectural abstractions

[POPL 2009, DAMP 2009, TPHOLs 2009, CACM 2010, CAV 2010, ECOOP 2010, PLDI 2011, TACAS 2011, POPL 2012, PLDI 2012, CAV 2012, TOPLAS 2014 (A,N,T), MICRO 2015, POPL 2016, POPL 2017, POPL 2018, POPL 2019, ESOP 2020, Arm ARM, RISC-V spec] Part 2: Relaxed-memory concurrency

### Architectural relaxed-memory concurrency, 2021

| x86 (Intel, AMD)<br>IBM Power<br>ARMv8-new<br>RISC-V<br>SPARC TSO |                          | user: de facto<br>user: de facto<br>user: official<br>user: official | system: | in progress |
|-------------------------------------------------------------------|--------------------------|----------------------------------------------------------------------|---------|-------------|
| ARMv8-old<br>ARMv7<br>SPARC RMO, PSO<br>Itanium<br>Alpha<br>MIPS  | √ ish<br>√ ish<br>√<br>√ | not used<br>going<br>gone                                            |         |             |

#### ▶ 1. Abstract-microarchitectural operational models

abstract machines that capture the hardware intuition, to explain what's going on, with explicit out-of-order and speculative execution, but without the complexity of real hardware. Incrementally executable

### Thread semantics: out-of-order, speculative execution abstractly

Our thread semantics has to account for out-of-order and speculative execution.



- instructions can be fetched before predecessors finished
- instructions independently make progress
- branch speculation allows fetching successors of branches
- multiple potential successors can be explored

NB actual hardware implementations can and do speculate even more, e.g. beyond strong barriers, so long as it is not observable

Contents 4.4 Armv8-A, IBM Power, and RISC-V: Armv8-A/RISC-V operational model

### Operational model

- each thread has a tree of instruction instances;
- no register state;
- threads execute in parallel above a flat memory state: mapping from addresses to write requests
- ▶ for Power: need more complicated memory state to handle non-MCA



(For now: plain memory reads, writes, strong barriers. All memory accesses same size.)

Contents 4.4 Armv8-A, IBM Power, and RISC-V: Armv8-A/RISC-V operational model

#### Part 2: Relaxed-memory concurrency

#### **Commit Barrier**

#### **Condition:**

A barrier instruction *i* in state Plain (Barrier(*barrier\_kind*, *next\_state*')) can be committed if:

- 1. all po-previous conditional branch instructions are finished;
- 2. (BO) if *i* is a dmb sy instruction, all po-previous memory access instructions and barriers are finished.

4.4 Armv8-A, IBM Power, and RISC-V: Armv8-A/RISC-V operational model

360

#### Really: functional code that computes allowed transitions



Part 2: Relaxed-memory concurrency



Part 2: Relaxed-memory concurrency



Part 2: Relaxed-memory concurrency



Part 2: Relaxed-memory concurrency

| 👎 RMEM rcf4d957e6b279: ×                                                                                                                                                                                                                                 | - #:                                                                                                                                       |                                                                                    |                                                  |                              |                                        |                                                                                                                                           |              |                                                   |                   |                   |         |     |      |   |          |      |   |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|--------------------------------------------------|------------------------------|----------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------|--------------|---------------------------------------------------|-------------------|-------------------|---------|-----|------|---|----------|------|---|
| $\leftrightarrow \rightarrow \circ \circ$                                                                                                                                                                                                                | O & http://ww                                                                                                                              | w.el.cam.ac.uk/~sFSC                                                               | 2/regression                                     | u/imeiny# *te                | et*%34/AArch51.)                       | NPN-20pon-20ctrf                                                                                                                          | /\*PadWW RFe | DpCirldR Frei                                     | "\nCycle%3DRfeDoc | trida Fre PodwW// | H 133%  | \$2 | _    | 8 | <u>.</u> | μη = | Ī |
| RMEM AArch64 MP+pa+ctrl                                                                                                                                                                                                                                  | Load Itmus Lo                                                                                                                              | ad ELF Model                                                                       | Next B                                           | ack Resta                    | rt Search •                            | Execution *                                                                                                                               | Interface *  | Graph *                                           | Link to this stat | e • Help          |         |     |      |   |          |      | l |
| State 👻                                                                                                                                                                                                                                                  |                                                                                                                                            | - 100%                                                                             |                                                  |                              | Graph ~                                |                                                                                                                                           |              |                                                   | Refresh           | Download .dot     | centre  |     | 100% |   |          |      | l |
| Storage subsystem state (ff<br>Memory = [(0:4:0):W(                                                                                                                                                                                                      | lat):<br>0×1090 (y)/4=1,                                                                                                                   | (0:2:0):W 0x11                                                                     | 90 (x)/4-                                        | 4                            |                                        | 15                                                                                                                                        |              |                                                   |                   |                   |         |     |      |   |          |      |   |
| <pre>lcaches = [Thread 0: [], Buffer = [] IC waiting = []</pre>                                                                                                                                                                                          | Thread 1: [[]                                                                                                                              |                                                                                    |                                                  |                              | 80:W 0x1100 (x)/4<br>81:W 0x1005 (y)/4 | 0 Thread 0                                                                                                                                | The          | nd 1                                              |                   |                   |         |     |      |   |          |      |   |
| Thread 0 state:           read issue order:         11           0.1         0x85000         fet:           8x=0x.53         0x00000000000100           10.72         0x85000         fet:           10.1         0x700000000000000000000000000000000000 | ched MOV W0,#1<br>n<br>ched STR W0,[X1<br>S3'00000000000<br>n<br>ched STR W2,#1<br>h<br>ched STR W2,[X3<br>S3'00000000000<br>from 0:4<br>n | reg writes:<br>1000 (x) from 1<br>reg writes:<br>1 men writes:<br>10000 (y) from 1 | (0:2:0):W<br>mitialsta<br>(0:4:0):W<br>mitialsta | 0×1100<br>e,<br>0×1000<br>e, | a a a a a a a a a a a a a a a a a a a  | 0.1 MCV W0.41<br>0.2 STR W0_1X1<br>0.2 STR W0_1X1<br>0.2 STR W0_1X1<br>0.2 MCV W2,42<br>0.3 MCV W2,42<br>0.4 STR W2_1X2<br>0.4 STR W2_1X2 |              | W0[X1]<br>8 [9]4 - 1<br>W0LC00<br>c<br>0 (X)4 - 0 |                   |                   |         |     |      |   |          |      |   |
| Console ><br>Step 3 13/7 TINISHEB, 42 to<br>Step 4 (4/7 finished, 45 to<br>Step 3 13/7 TINISHED, 42 to<br>Step 3 12/7 finished, 42 to<br>Step 1 (2/7 finished, 8 tr<br>Warning:<br>can't go back from the dail<br>Step 1 (2/7 finished, 8 tr             | rns) Choose [1]<br>rns) Choose [2]<br>rns) Choose [1]<br>rns) Choose [0]<br>ns) Choose [3]:<br>tial state<br>ns) Choose [3]:               | - 100%                                                                             | +   •                                            | • • •                        |                                        | Teel M <sup>III</sup> ypo-                                                                                                                | -1<br>ctri   |                                                   |                   |                   |         |     |      |   |          |      |   |
| Step 2 (2/7 finished, 39 tr<br>Step 3 (3/7 finished, 42 tr<br>Step 4 (4/7 finished, 45 tr<br>Step 5 (7/7 finished, 139 t                                                                                                                                 | rns) Choose [0]<br>rns) Choose [1]<br>rns) Choose [2]<br>trns):                                                                            |                                                                                    |                                                  |                              | Sources V<br>AArch64 M                 | P+po+ciri<br>MP+po+ctrl                                                                                                                   | Feel         |                                                   |                   | Downka            | id Edit | -   | 100% | + | •        | 0 *  |   |

Part 2: Relaxed-memory concurrency

> 2. Axiomatic: predicates on candidate complete execution graphs

typically forbidding certain cycles using various relations over those graphs. More concise, further from hardware intuition, not incrementally executable

#### Example: speculative execution





Allowed. The edges form a cycle, but ctrl; [R] to read events is not in ob

Contents

4.5 Armv8-A. IBM Power, and RISC-V: Armv8-A/RISC-V axiomatic model



### Write forwarding from an unknown-address write





**Forbidden.** ob includes addr; rfi: forwarding is only possible when the address is determined

Part 2: Relaxed-memory concurrency

### Write forwarding on a speculative path



acyclic pos | fr | co | rf **let** obs = rfe | fre | coe **let** dob = addr | data ctrl; [W] addr; po; [W] (ctrl | data): coi (addr | data); rfi . . . let bob = po; [dmb.sv]; po . . . **let** ob = obs | dob | aob | bob acyclic ob

Allowed. Forwarding is allowed: rfi (and ctrl;rfi and rfi;addr) not in ob (compare x86-TSO)

Contents 4.5 Armv8-A, IBM Power, and RISC-V: Armv8-A/RISC-V axiomatic model

 3. Promising-Arm Operational (more abstract, not much like hardware – but still incrementally executable) [Pulte, Pichon-Pharabod, Kang, Lee, Hur]

All useful, for different purposes

Proved equivalent:

- Abstract-microarchitectural and axiomatic [Pulte]
- Axiomatic and Promising-Arm operational [P,P-P,K,L,H]

### How do we make the semantics executable as test oracles?

#### **Operational:**

rmem tool [Flur, Pulte, French, Gray, Sarkar, Sewell, ...]

exhaustively find all abstract-machine executions (optimising to avoid exploring trivial interleavings)

#### Axiomatic:

herd tool [Alglave, Maranget]

exhaustively find all candidate execution graphs (optimised), and check axiomatic-model predicate for each

isla-axiomatic [Armstrong, Simner, Campbell]

use SMT solver on combination of axiomatic model and Isla symbolic execution of ISA

### Arm systems concurrency: ifetch, IC, DC, etc. in isla-axiomatic

| ISU * SM / ESO(2020 * DURAS HE * MERCAY INDEX * SAF BY HEROLE UN                         | CONTRACT REPORT OF STATES                                    | n Familian A                                                                                                                             |
|------------------------------------------------------------------------------------------|--------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|
| 15 Ithread 81                                                                            | 1 TANNA AAVONA PROP 102PT                                    | (a) Q M. Cl. Berlins. + 141 →                                                                                                            |
| 10 isit = { X2 = "x", X4 = "f:", X0 = "Dx14000001" }<br>17 cb4e = """<br>10 STR W0, [X4] | include "cos.cat"                                            | Locial Sue                                                                                                                               |
| 20 CBZ W2, 1,31                                                                          | <pre>i lut strlinb = (strl &amp; (_ * ISB)); po</pre>        |                                                                                                                                          |
| 11 L:<br>22 150                                                                          | ] lot ifr = (irt*-1; col 6 loc                               | Thread #1 Thread #0                                                                                                                      |
| 12 DL F<br>In Hev W1, W1                                                                 | 5 let iseq = DH1: (wco & scU1: DC1: Jwco & scU1: IJC1        | 107 AL<br>IF #6410000 (4z: -700514272)<br>IF #640000 (4z: -700514272)                                                                    |
| 276 B Leat<br>276 T<br>377 B 10                                                          | 11 lut obs = rfw   fr   wcu<br>12   irf   (ifr; iseq)        |                                                                                                                                          |
| 10 11:<br>10 M69 W10,#2                                                                  | 14 let fob - [IF]; fpo; [IF]<br>15   [IF]; fo                |                                                                                                                                          |
| 31 10                                                                                    | 10   [ISB]; fm <sup>-1</sup> ; fps                           | 4514000001<br>IF 8540001c (4): 8514000001 32<br>W #540001c (4): 332544321                                                                |
| LI RET                                                                                   | <pre>18 let dob = addr   data 10   strl; (w)</pre>           |                                                                                                                                          |
| In Contra                                                                                | 20   (ctrl   (addr, pol), [I58]<br>31   addr, po. (W)        |                                                                                                                                          |
| 17 Ithread. 11                                                                           | 27 [ (addr ] data); rf1                                      | mit will, #lt2 / #2<br>be seen a user should be strengthered and a second |
| 10 cade -                                                                                | <pre>20 1 trange(rmw)1; rf1; IA   01</pre>                   |                                                                                                                                          |
| 41 HOV WD. W18                                                                           | 27 Tet Bob = ER   WI; pp; IOH8.5Y]                           |                                                                                                                                          |
| DER SY                                                                                   | 29   [DHB.SY]; po; [R   W]<br>29   [L]; po; [A]              | 10 W2 (0)                                                                                                                                |
| 200 Rej 1923                                                                             | 30   [R]; poi [099.L0]<br>31.   [098.L0]; po: (n   w)        | IF #4000(24.(4) -096416102                                                                                                               |
| 42 [fisal]                                                                               | 33 [ (N), po; [540.57]                                       |                                                                                                                                          |
| <pre>#II assertion = "(end != (register X0 1) 2) (= (register X2 0) 1)</pre>             | 35 1 (B   W) pol (L)                                         |                                                                                                                                          |
| Condering   Content of B                                                                 | 11 1058 ISH1; pc; 18 1 W 1 F 1 C1                            | HIGH W2, 410<br>IF 85410004 (42: 705294424 IF 8540008 (42: 872415266 III 1                                                               |
| Allowed: 1 out of 8 allowed                                                              | 10 (OMB.SY); po; (DC)                                        |                                                                                                                                          |
|                                                                                          | in Tart cob = IM   W1; two 6 scl1; IOCI                      | (en.)                                                                                                                                    |
|                                                                                          |                                                              |                                                                                                                                          |
|                                                                                          | <pre>is let ob = (obs   fob   dob   aob   bob   cob)^+</pre> | 82 Prost. 51<br>19 454210038 (42) -7236664407<br>19 554200038 (42) -721207329 (45)                                                       |
|                                                                                          | Ar (* Internal visibility reprirement *)                     |                                                                                                                                          |
|                                                                                          | 10 (7 Esternal visibility semicroset 1)                      |                                                                                                                                          |
|                                                                                          | "i irreflexive ob ac external                                |                                                                                                                                          |

### Arm systems concurrency: relaxed virtual memory

TLB caching needs explicit synchronisation: on Armv8-A, break-before-make. This is the 'break' side of break-before-make, but without an ISB at the end on the same thread, so it is not guaranteed that the po-later translations for this core are restarted.





### Architectural relaxed-memory concurrency, people



Susmit Sarkar



Derek Williams



Kathy Gray



Scott Owens



Richard Grisenthwaite (Orm)



Jon French



Francesco Zappa Nardelli



Shaked Flur

Ben Simner



Jade Alglave



Christopher Pulte





Luc Maranget



Will Deacon (**CITM**)



Alasdair Armstrong

Jean Pichon-Pharabod

Major contributors for our joint x86, IBM Power, ARM h/w models, chronologically. See also much other work by Alglave et al., by Lustig and other RISC-V TG members, and by others for GPUs, transactions, persistence Part 2: Relaxed-memory concurrency

### Programming-language relaxed-memory concurrency

Similar but harder problem: union of hardware models, plus compiler optimisations

Worked with ISO C++ WG21 Concurrency Group to formalise and partially fix the design for C/C++11 concurrency

[Mark Batty, Scott Owens, Susmit Sarkar, Tjark Weber, Peter Sewell; and later by others]

Axiomatic model, based on DRF-SC

Executable-as-test oracle semantics in cppmem and Cerberus-BMC

Experiment much harder here – but proof of implementation schemes (not full impls!) more feasible

Success:

- helped clarify what behaviour these major industry abstractions allow (x86, Armv8-A, IBM Power, RISC-V, C, C++; JavaScript, WebAssembly)
- revealing implicit complexity contributed to substantial revisions of some of these
- mathematical relaxed-memory semantics and tools based on it now routinely used in industry
- theory and verification researchers can build on these models (and fix them where needed), with program logics, model-checking, models for other features, ...

> addressing an area where industry *knew* it had problems

- ▶ addressing an area where industry *knew* it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs

- addressing an area where industry knew it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication

- addressing an area where industry knew it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics

- ▶ addressing an area where industry *knew* it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics
- emphasis on understanding what the architectural abstraction *is*, not on traditional formal verification of implementations, or on bug-finding (though we did find some h/w bugs, and prove correctness of implementation schemes)

- ▶ addressing an area where industry *knew* it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics
- emphasis on understanding what the architectural abstraction *is*, not on traditional formal verification of implementations, or on bug-finding
  - (though we did find some h/w bugs, and prove correctness of implementation schemes)
- as for TCP, had to take seriously both fundamental and contingent complexity of existing artifacts

- addressing an area where industry knew it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics
- emphasis on understanding what the architectural abstraction *is*, not on traditional formal verification of implementations, or on bug-finding
  - (though we did find some h/w bugs, and prove correctness of implementation schemes)
- as for TCP, had to take seriously both fundamental and contingent complexity of existing artifacts
- blending science and engineering

- addressing an area where industry knew it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics
- emphasis on understanding what the architectural abstraction *is*, not on traditional formal verification of implementations, or on bug-finding
  - (though we did find some h/w bugs, and prove correctness of implementation schemes)
- as for TCP, had to take seriously both fundamental and contingent complexity of existing artifacts
- blending science and engineering
- had to make semantics and tools that are mature and useful, not just enough for a paper - in it for the long haul

- addressing an area where industry knew it had problems
- established credibility through finding early specification ambiguities (asking hard to-the-point questions) and finding hardware bugs
- had to take seriously industry incentives and concerns, balancing with academic publication
- invented semantics as appropriate turned out quite unlike traditional concurrency semantics
- emphasis on understanding what the architectural abstraction *is*, not on traditional formal verification of implementations, or on bug-finding
  - (though we did find some h/w bugs, and prove correctness of implementation schemes)
- as for TCP, had to take seriously both fundamental and contingent complexity of existing artifacts
- blending science and engineering
- had to make semantics and tools that are mature and useful, not just enough for a paper - in it for the long haul
- invest research and engineering effort in proportion to the problem size

Part 2: Relaxed-memory concurrency

# Part 3: Instruction-set architecture semantics

### Instruction-set architecture (ISA) semantics

Architecture semantics

- pprox Concurrency model
  - + Instruction-set-architecture (ISA) semantics
  - + other system-on-chip (SoC) semantics

subtle but small, as above more straightforward but large ...ignore still

### Instruction-set architecture (ISA) semantics

ISA semantics traditionally paper pseudocode (at best), in vendor manuals

#### Armv8-A

- ► Alastair Reid et al. made the Arm-internal ASL mechanised, within Arm
- ▶ we translate it into our custom ISA definition language, Sail
- ▶ and thence to provers and C, validating against AVS
- 100k LoS

#### RISC-V

- we hand-wrote an ISA semantics in Sail
- ▶ adopted by RISC-V International as their official formal spec
- 10k LoS

Both complete enough to boot an OS or hypervisor, and pretty authoritative

Part 3: Instruction-set architecture semantics



Part 3: Instruction-set architecture semantics
## ISA Semantics: People

Alasdair Armstrong, Thomas Bauereiss, Brian Campbell, Alastair Reid, Kathryn E. Gray, Robert M. Norton, Prashanth Mundkur, Mark Wassell, Jon French, Christopher Pulte, Shaked Flur, Ian Stark, Neel Krishnaswami, and Peter Sewell.

## ISA Semantics: Reflections

- Different kind of problem: not asking what the semantics is
- ▶ ...but rather, how can we make it practical to work with these large formal definitions
- ...not just for researchers, but also for conventional engineers
- Solution: Sail language design and engineering
- not much nondetermism in sequential spec ...so can make executable as test oracle "just" with translations to C and OCaml

# Part 4: C Semantics

## C Semantics

#### C semantics

Sequential thread-local semantics
 Memory object model
 Relaxed memory model
 subtle and large, but largely in ISO
 subtle and small; unknown
 subtle and small; see above

Cerberus C semantics: Kayvan Memarian, Victor Gomes, Stella Lau, Kyndylan Nienhuis, Justus Matthiesen, Peter Sewell

More complex interaction with industry, via ISO C/C++ committees and Clang/GCC communities

# Part 5: CHERI

Computers are still terrible (even if we can define them more precisely)

Those legacy design choices:

- ▶ systems languages that don't enforce protection (C/C++)
- hardware that only enforces coarse-grain protection, using virtual memory

Baked in to the critical systems codebase and the entire industry.

Result, in today's adversarial environment:

- > programming errors can often lead to exploitable vulnerabilities
- ▶ 50%+ (?) of security problems involve a memory safety violation

| Cerberus 👻                                                                                                                    | example.c 👻                                                                                     | File 👻 M         | odel 👻 | Views 👻 | Step: 0 | Forward | Back | Restart | Sear | :h - 🌶 |
|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|------------------|--------|---------|---------|---------|------|---------|------|--------|
| example.c                                                                                                                     |                                                                                                 |                  |        | Memory  | ×       |         | 1    |         |      |        |
| <pre>1 #includ<br/>2 int x=1<br/>3 int sec<br/>4 int mai<br/>5 int *<br/>6 p = p<br/>7 int 1<br/>8 print<br/>9 }<br/>10</pre> | <pre>le <stdio.h> ; ret_key = 40 n() { p = &amp;x ++1; { eak = *p; f("leak: %d\</stdio.h></pre> | 91;<br>n",leak); | }      | Zoom I  | n Zoom  | Out —   | •    |         | Fit  | Reset  |
| Console x                                                                                                                     |                                                                                                 |                  |        |         |         |         |      |         |      |        |
| ı                                                                                                                             |                                                                                                 |                  |        |         |         |         |      |         |      |        |

Cerberus C semantics:

Exploring C Semantics and Pointer Provenance. Memarian, Gomes, Davis, Kell, Watson, Sewell. In POPL 2019









So what happens if we compile and run it?

# \$ gcc -std=c11 -00 -Wall -pedantic 35c3.c \$ ./a.out leak: 4091

35c3.output All L7 (Shell:no process +4)

How can we (practically) make things better?

How can we use semantics to increase assurance and ease engineering, at design time rather than post facto?

## CHERI Security

CHERI: architecture extensions to support *hardware-enforced* 

- fine-grain memory protection
- secure encapsulation

using unforgeable capabilities

Hardware/software co-design since 2010: Robert N. M. Watson, Simon W. Moore, Peter G. Neumann, ...

Hardware/software/semantics co-design since 2014

Main CHERI page: www.cheri-cpu.org

An Introduction to CHERI. Watson, Moore, Sewell, Neumann. UCAM-CL-TR-941, 2019. Part 5: CHERI

### CHERI basic idea: add hardware support for capabilities ISO C CHERI C #include <stdio.h> x: signed int [@3, 0x14] int x=1:



## CHERI basic idea: add hardware support for capabilities ISO C CHERI C



#### CHERI basic idea: add hardware support for capabilities ISO C CHERI C #include <stdio.h> x: signed int [@3, 0x14] x: signed int [@3, 0x14] int x=1; int secret\_key = 4091 int main() { **int** \*p = &x;JB/Exploit Irap p = p+1;**int** y = \*p; printf("%d\n 0x18 address 0x18 base 0x14 length 0x4 R/W perms tag 1

## CHERI architecture key design points

- encoding allocation data and permissions within capability permits fast checking at access-time, without a lookup or TLB pressure
- ▶ ISA design lets code shrink capabilities, but never grow them
- non-addressable tags prevent forging (one bit per capability-sized/aligned unit of memory, cleared by any non-capability write, and one bit per register)
- compressed 128-bit encoding reduces extra memory cost
- ▶ can use capabilities either for all pointers ("pure cap"), or just when desired
- co-exists nicely with existing C and C++, existing ISAs, existing virtual memory (when desired)
- additional sealed capabilities for secure encapsulation (skip today)
- initial focus was on *spatial* memory safety, but CHERI also supports various *temporal* memory safety approaches (skip today)

## CHERI C/C++ fine-grained pure-capability protection

This is using capabilities instead of integer pointers throughout an address space, protecting from exploitable coding errors

Initial power-on universal capability successively refined by the kernel, run-time linker, compiler-generated code, heap allocator, ...

Spatial protection automatically applied at two levels:

**Language-level pointers** to stack and heap allocations, global variables, TLS variables, subobjects

**Language implementation pointers** stack pointers, return addresses, C++ vtable pointers, GOT pointers, PLT entry pointers, vararg array pointers, ELF aux arg pointers, ...

'nice' C/C++ gets all this just from a recompile  $_{\mathsf{Part} \ 5: \ \mathsf{CHERI}}$ 

## CHERI C/C++ secure encapsulation

How about interaction between mutually untrusting components?

Classic virtual memory protection ok for processes, but doesn't scale well, e.g. for browser tabs, mail-readers, and server-side code handling untrusted data, with controlled sharing

Use CHERI capabilities (including the sealing mechanism and/or default capabilities) for this, e.g. to encapsulate instances of untrusted libraries, and for co-processes – a different way of using CHERI (and can combine with fine-grain protection)

No need to update MMU mappings, so very low domain-crossing cost

## Does it work?

Software porting cost: ranging down to 0.04% LoC for well-behaved C

Vulnerabilities: MSRC estimate large fraction of their critical vulnerabilities would be deterministically mitigated

Performance: encouraging - but hard to know for sure from academic studies

## **CHERI** People

Robert N. M. Watson<sup>1</sup>, Simon W. Moore<sup>1</sup>, Peter Sewell<sup>1</sup>, Peter G. Neumann<sup>7</sup>

Arm: Graeme Barnes<sup>2</sup>, Richard Grisenthwaite<sup>2</sup>, Lee Eisen<sup>2</sup>, and many more

Hesham Almatary<sup>1</sup>, Jonathan Anderson<sup>1\*</sup>, Alasdair Armstrong<sup>1</sup>, Peter Blandford-Baker<sup>1</sup>, John Baldwin<sup>7</sup>, Hadrien Barrel<sup>1</sup>, Thomas Bauereiss<sup>1</sup>, Ruslan Bukin<sup>1</sup>, Brian Campbell<sup>3</sup>, David Chisnall<sup>1\*,11</sup>, Jessica Clarke<sup>1</sup>, Nirav Dave<sup>7\*</sup>, Brooks Davis<sup>7</sup>, Lawrence Esswood<sup>1</sup>, Nathaniel W. Filardo<sup>1\*,11</sup>, Franz Fuchs<sup>1</sup>, Khilan Gudka<sup>1\*</sup>, Brett Gutstein<sup>1</sup>, Alexandre Joannou<sup>1</sup>, Robert Kovacsics<sup>1\*</sup>, Ben Laurie<sup>5</sup>, A. Theo Markettos<sup>1</sup>, J. Edward Maste<sup>1\*</sup>, Alfredo Mazzinghi<sup>1</sup>, Alan Mujumdar<sup>1\*</sup>, Prashanth Mundkur<sup>7</sup>, Steven J. Murdoch<sup>1\*</sup>, Edward Napierala<sup>1</sup>, Kyndylan Nienhuis<sup>1</sup>, Robert Norton-Wright<sup>1\*,11</sup>, Philip Paeps<sup>1\*</sup>, Lucian Paul-Trifu<sup>1\*</sup>, Allison Randal<sup>1</sup>, Ivan Ribeiro<sup>1</sup>, Alex Richardson<sup>1\*,5</sup>, Michael Roe<sup>1</sup>, Colin Rothwell<sup>1\*</sup>, Peter Rugg<sup>1</sup>, Hassen Saidi<sup>7</sup>, Thomas Sewell<sup>1</sup>, Stacey Son<sup>1\*</sup>, Ian Stark<sup>3</sup>, Domagoj Stolfa<sup>1\*</sup>, Andrew Turner<sup>1</sup>, Munraj Vadera<sup>1\*</sup>, Jonathan Woodruff<sup>1</sup>, Hongyan Xia<sup>1\*</sup>, Vadim Zaliva<sup>1</sup>, Bjoern A. Zeeb<sup>1\*</sup>,

1 University of Cambridge, 2 Arm, 3 University of Edinburgh, 4 Seoul National University, 5 Google, 6 KAIST, 7 SRI International, 8 University of St. Andrews, 9 Inria Paris, 10 Aarhus University, 11 Microsoft Research, \* previously

## Academic CHERI

SRI + Cambridge over 11 years + 3 DARPA programs ( $\sim$ \$26M), EPSRC (£7.4M); Innovate UK (£2.7M); Google / DeepMind / Arm / HPE ... ( $\sim$ £1M)

Architecture design:

► CHERI-MIPS, CHERI-RISC-V

Hardware implementations (BSV/FPGA):

- CHERI-MIPS and extensions of BSV RISC-V cores (Piccolo, Flute, Toooba)
- …and Qemu emulator

Software stack:

▶ LLVM, linker, FreeBSD, FreeRTOS, temporal safety, ...

Semantics:

lightweight and heavyweight "rigorous engineering"

## "Rigorous Engineering"

#### Lightweight:

- use formal ISA semantics, in L3 and Sail, as central design documents (owned by CHERI researchers and engineers)
- use in architecture specification (readable)
- make executable as a test oracle, auto-translating Sail/L3 to C/OCaml/SML (~ 400KIPS, booting FreeBSD in 4 min)
  - use for testing hardware against
  - use for software bring-up (supporting existing engineering practice)
- use for fast exploration of design alternatives
- use for automatic test generation
- auto-translate to SMT and use to check properties

## What does CHERI guarantee?

Q. But how do we know the architecture *does* enforce the intended security protections?

Q. How can we even state them precisely?

## What does CHERI guarantee?

Q. But how do we know the architecture *does* enforce the intended security protections?

Q. How can we even *state* them precisely?

A. Use L3 and Sail infrastructure to enable *machine-checked mathematical proof* that specific precise properties always hold

## CHERI security properties

#### Theorem

For any intra-domain trace, the reachable capabilities from the final state are no greater than those of the initial state.

#### Theorem

Any trace within a properly set-up compartment cannot affect other memory, and can exit the compartment only in controlled ways.

Properties of arbitrary code above the CHERI-MIPS ISA.

Mechanised Isabelle proofs above L3 model

Rigorous engineering for hardware security: Formal modelling and proof in the CHERI design and implementation process. Nienhuis, Joannou, Bauereiss, Fox, Roe, Campbell, Naylor, Norton, Moore, Neumann, Stark, Watson, Sewell. In Security and Privacy 2020.

Adapted proofs as ISA evolved ("regression proof")

("Heavyweight Rigorous Engineering" – but not hw or sw verif)  $_{\mathsf{Part 5: CHERI}}$ 

## Industrial CHERI?

Central question for adoption: does CHERI provide good protection at acceptable performance and software-porting cost?

Academic evaluation very encouraging – Arm and others interested and involved – but it's not evaluated in a modern high-end superscalar core

Hard for industry to commit without that – but hard to get industry-scale evidence without major investment in demonstrator

## ISCF Digital Security by Design (UKRI)

5-year Digital Security by Design UKRI program: £70M UK gov. funding, £117M industrial match, to create CHERI-Arm ("Morello") prototype architecture, hardware implementation, demonstrator SoC + board, software, and proofs

Leap over that supply-chain gap that makes adopting new architecture difficult – validating concepts in microarchitecture, architecture, and software *at scale* 

Arm, UCam, U.Ed., Linaro, and additional industrial and academic R&D (EPSRC, ESRC, Innovate UK)

2020 emulation models; 2021/2022 Morello board delivery.



## ISCF Digital Security by Design (UKR

5-year Digital Security by Design UKRI program: £70M UK gov. funding, £117M industrial match, to create CHERI-Arm ("Morello") prototype architecture, hardware implementation, demonstrator SoC + board, software, and proofs

Leap over that supply-chain gap that makes adopting new architecture difficult – validating concepts in microarchitecture, architecture, and software *at scale* 

Arm, UCam, U.Ed., Linaro, and additional industrial and academic R&D (EPSRC, ESRC, Innovate UK)

2020 emulation models; 2021/2022 Morello board delivery. As of last month: pure-capability CheriBSD kernel boots on silicon!



(c) Arm 2021

## The Morello Board

- An Industrial Demonstrator of a Capability architecture
- Uses a prototype capability extension to the Arm Architecture
  - Prototype is a "superset" of what could be adopted into the Arm architecture
- Use of a superset of the architecture is very unusual
  - Also unrealistic as a commercial product there will be some frequency effects
  - · However, there are tight timescales so architecture is nearly complete now
- The superset of the architecture will allow a lot of software experimentation
  - · Various different mechanisms for compartmentalisation
  - Collection of features for which the justification is unclear
  - Techniques for holding the capability tag bit
- Architecture will have formally proved security properties (with UoC and UoE)
- Morello Board will be the ONLY physical implementation of this prototype architecture
  - · Learnings from these experiments will be adopted into a mainstream extension to the Arm architecture
  - NO COMMITMENT TO FULL BINARY COMPATIBILITY TO THE PROTOTYPE ARCHITECTURE
    - But successful concepts are expected to be carried forward into the architecture and can be reused there





# Morello Board overview (subject to change)



- Quad core bespoke high-end CPU with prototype capability extensions
  - Backwards compatibility with v8.2 AArch64-only
  - Based on Neoverse N1 core
    - Multi-issue out-of-order superscalar core with 3 levels of cache
  - Build in 7nm process
  - Targeting clock frequency around 2GHz
- Reasonable performance GPU and Display controller
  - Standard Mali architecture core not extended with capability
  - Supports Android
- PCIe and CCIx interfaces including to FPGA based accelerators
- FPGA for peripheral expansion
- SBSA compliant system
- 16GB of System Memory (expandable to 32GB tbc)

## Morello software stacks

Complete open-source CHERI-enabled software stack from bare metal up, to validate and evaluate design, and support future R&D:

| $\label{eq:open-source} \textbf{Open-source application suite} (WebKit, Python, OpenSSH, nginx, PostgresQL,)$               |               |  |  |  |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------|---------------|--|--|--|--|--|--|
| CheriBSD/Morello (SRI/Cambridge)                                                                                            | Android (Arm) |  |  |  |  |  |  |
| <ul> <li>FreeBSD kernel + userspace, application stack</li> <li>Kernel spatial and referential memory protection</li> </ul> |               |  |  |  |  |  |  |
| <ul> <li>Userspace spatial, referential, and temporal memory<br/>protection</li> </ul>                                      |               |  |  |  |  |  |  |
| Intra-process compartmentalization                                                                                          |               |  |  |  |  |  |  |
| Co-process IPC                                                                                                              |               |  |  |  |  |  |  |
| Armv8-A 64-bit binary compatibility for legacy binaries                                                                     |               |  |  |  |  |  |  |
| CHERI-extended Google Hafnium hypervisor (Morello only)                                                                     |               |  |  |  |  |  |  |

CHERI Clang/LLVM compiler suite, LLD, LLDB, GDB

## Morello ISA verification

Morello ISA designed by Arm, with detailed discussion. In ASL, extending Armv8-A

We [Bauereiss, Campbell, Thomas Sewell, Armstrong]:

- auto-translate that 62k LoS ASL into Sail (from weekly drops)
- use Sail to generate Isabelle (210k LoS) and SMT
- ▶ use Sail+SMT symbolic evaluation (Isla) to generate tests for h/w and QEMU
- use Isabelle for mechanised proof of general security properties

Found various security holes, including one not previously known

Machine-checked mathematical proofs of security properties of full-scale industry architecture

Verified security for the Morello capability-enhanced prototype Arm architecture. Bauereiss, Campbell, T.Sewell, Armstrong, Esswood, Stark, Barnes, Watson, Sewell. UCAM-CL-TR-959, Sept. 2021.

# Conclusion

## Conclusion: back to the main question

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

How?
(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

How?

(partly) exploit advances in computational power and proof tools

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

- (partly) exploit advances in computational power and proof tools
- (mostly) focus on interfaces

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

- (partly) exploit advances in computational power and proof tools
- (mostly) focus on interfaces
- (especially where there's a need for more clarity)

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

- (partly) exploit advances in computational power and proof tools
- (mostly) focus on interfaces
- (especially where there's a need for more clarity)
- …on executable-as-test-oracle semantics for them,

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

- (partly) exploit advances in computational power and proof tools
- (mostly) focus on interfaces
- (especially where there's a need for more clarity)
- …on executable-as-test-oracle semantics for them,
- …inventing new semantics as appropriate, and

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

- (partly) exploit advances in computational power and proof tools
- (mostly) focus on interfaces
- (especially where there's a need for more clarity)
- …on executable-as-test-oracle semantics for them,
- …inventing new semantics as appropriate, and
- ...on engaging with mainstream artifacts and engineering processes

(How) can we develop mathematical semantics for the actual mainstream artifacts we rely on, and apply it to improve the mainstream engineering of them?

Yes, we can

How?

- ▶ (partly) exploit advances in computational power and proof tools
- ► (mostly) focus on interfaces
- (especially where there's a need for more clarity)
- …on executable-as-test-oracle semantics for them,
- ...inventing new semantics as appropriate, and
- ...on engaging with mainstream artifacts and engineering processes

And with collaboration with many excellent colleagues

Conclusion