# arm

Errata implementation framework

Boyan Karatotev (boyan.karatotev@arm.com) 09.03.2023

## arm

## Motivation



## 1. Implementing an errata today is verbose

Majority set a bit at reset

a)

```
* Errata Workaround for Cortex A77 Errata #1925769.
         * This applies to revision <= r1p1 of Cortex A77.
         * x0: variant[4:7] and revision[0:3] of current cpu.
func errata_a77_1925769_wa
        /* Compare x0 against revision <= r1p1 */</pre>
                check_errata_1925769
        /* Set bit 8 in ECTLR_EL1 */
                x1, CORTEX_A77_CPUECTLR_EL1
                x1, x1, #CORTEX_A77_CPUECTLR_EL1_BIT_8
                CORTEX A77 CPUECTLR EL1,
endfunc errata_a77_1925769_wa
func check_errata_1925769
        /* Applies to everything <= r1p1 */</pre>
               x1, #0x11
                cpu rev var ls
endfunc check_errata_1925769
```

- + make rule
- + docs mention

```
c)
```



#### 2. The errata ABI

```
CORTEX_A77_H_INC
.cpu_pn = CORTEX_A77_MIDR,
.cpu_errata_list = {
    {1508412, 0x00, 0x10},
    {1791578, 0x00, 0x11},
    {1925769, 0x00, 0x11},
    {1946167, 0x00, 0x11},
    {2356587, 0x00, 0x11},
    {UINT_MAX}, {UINT_MAX}
```

- + 1 more place to edit
- + Information again redundant
- + But not accessible



#### All the useful code

```
* This applies to revision <= r1p1 of Cortex A77.
func errata_a77_1925769_wa
        /* Compare x0 against revision <= r1p1 */
                check_errata_1925769
          Set bit 8 in ECTLR_EL1 */
endfunc errata_a77_1925769_wa
          Applies to everything <= r1p1 */
 ndfunc check_errata_1925769
```

The rest is boilerplate

And very annoying to get past review



## Of course, some are more involved

- + Longer workaround sequence
- More involved rev check
- + not applied at reset



## Practically all errata can be pigeonholed to this template

With small provisions to account for variations



#### Proposal – Aarch64 erratum implementation

- + make rule
- + docs mention

- -- workaround\_reset\_{start, end}
- -- workaround\_runtime\_{start, end}
- + sysreg\_bit\_set
- -- check\_erratum\_{ls, hs, range}
- + A runtime

- wrapper of erratum workaround function
- same but workaround manually applied
- reads back and asserts bit set when DEBUG=1
- checker helper



#### The runtime

```
struct erratum_entry {
    uintptr_t (*wa_func)(uint64_t cpu_rev);
    uintptr_t (*check_func)(uint64_t cpu_rev);
    /* Will fit CVEs with up to 10 character in the ID field */
    uint32_t id;
    /* we denote errata with 0, CVEs have their year here */
    uint16_t cve;
    uint8_t chosen;
    /* TODO(errata ABI): placeholder for the mitigated field */
    uint8_t _mitigated;
} __packed;
```

```
cpu_reset_func_start cortex_a77
cpu_reset_func_end cortex_a77
```

errata\_report\_shim cortex\_a77

- + workaround\_\*\_start registers an erratum entry
  - In per-cpu errata\_entries section (like cpu\_ops)
- + cpu\_reset\_func applies selected ones from the list
  - Special cpu behaviour can happen after
- + errata\_report\_shim does reporting when DEBUG=1
  - Common C function for all CPUs iterates the list



### This covers the majority of cases

Each macro can be incrementally unraveled to the old method for particularly nasty errata



#### The Procedure Call Standard

- → Some of the cpu operations must obey the PCS
- → => obey the PCS throughout
- + Based on the following (simplified) interpretation

| Reg       | Use                                                                                               |  |
|-----------|---------------------------------------------------------------------------------------------------|--|
| r0 - r15  | Scratch registers. Anyone can use at any time                                                     |  |
| r16, r17  | Avoid using. Used by the linker. Any branch (with a relocation) may corrupt it                    |  |
| r18       | Avoid using. Scratch, but may be used by the platform for inter procedure call state. Is this us? |  |
| r19 - r28 | Caller saved                                                                                      |  |
| r29, r30  | FP, LR                                                                                            |  |



#### Mandated register assignments

- + to avoid having to do register management
  - Also will simplify implementation
- → Subset of the full PCS
  - To eliminate the problem

| function                  | register | treatment                                 |
|---------------------------|----------|-------------------------------------------|
| Any BL                    | r0-r4    | May clobber                               |
| Workaround implementation | r0-r7    | May clobber                               |
|                           | r0, r5   | Parameter to implementation - cpu_rev_var |
| Erratum checker function  | r0-r4    | May clobber                               |
| All other                 | r8 - r30 | Treat as callee saved                     |

+ Runtime has similar assignments, documented in code



#### Aarch32

add\_erratum\_entry cortex\_a57, ERRATUM(813420), ERRATA\_A57\_813420

- + Implementation stays the same
- + Only registered to the framework for debug and ABI reporting
- + Removes some redundancy but little benefit to do fully





## The implementation

 $\times$   $\times$   $\times$   $\times$   $\times$ 

× × × × × × ×

Runtime part

### Cost – workaround/check functions

- + Check function identical
- Workaround function practically identical
  - isb moved to reset\_func
  - extra mov for compatibility
  - ASSERT when DEBUG=1 (gone on release builds)



#### Cost – errata\_entries list

- + Per-cpu list
- + 24 bytes per entry, 1 entry per erratum
  - some overlap with errata ABI. Designed to be reused
  - Minimal information to enable runtime and ABI reporting



## Cost – reset\_func

- + Fixed size of 19 instructions (76 bytes)
  - Previously 5 fixed + 2 per erratum
- + Loop with 8 instructions per erratum
  - Runs even if disabled (previously compiled out)
  - Previously only 2
- → Space saving when > 7 errata per cpu

```
func \_cpu\()_reset_func
                 cpu get rev var
           short circuit the location to avoid searching the list */
                      x12, :lo12:\ cpu\() errata list start
                     , x13, :lo12:\_cpu\() errata_list_end
errata_begin:
         /* if head catches up with end of list, exit */
         ldr x10, [x12, #ERRATUM_WA_FUNC]
/* TODO(errata ABI): check mitigated and checker function fields
                w11, [x12, #ERRATUM_CHOSEN]
           put cpu revision in x0 and call workaround */
```

disabled errata are not left out of the list due to the errata ABI



### Cost – errata reporting

- + A debug feature. Compiled out on release builds
  - Optimality superseded by ease of use
- + Common print function in C
  - Around 250 instructions
- + Per-cpu shim
  - 5 instructions



## arm

## Going forward



## The migration

- + Old and new style interoperable
  - old is inaccessible to framework
- → Migrate every aarch64 cpu 3 patches
- + Fill-in every aarch32 cpu 1 patch
- + Converge with errata ABI
- + Gradually submit to LTS



#### Aarch64 correctness

- → Patch 1 reorder only
  - To enforce reporting and binary search requirements
- → Patch 2 remove boilerplate and register to framework
  - Retain git blame of actual workaround
- → Patch 3 move to bit setting helpers
  - Readability and consistency benefit
  - Strictly speaking optional
- → Script to verify identical binary result
  - Within established tolerances eg. missing isb
- → Manual debugger run
  - a few will need genuine refactors
  - i.e. the usual errata testing process



#### Correctness contd.

- Open question build workarounds for CI runs
  - no elegant solution was apparent
  - Only done for Juno (hard-coded)
- → Add asserts location open question
  - Currently sysreg\_bit\_set reads the bit back
  - Assert workaround ran? How?
  - Assert get\_cpu\_var ran? How?

Aarch32 avoids correctness since no refactoring there



#### Downstream errata

- + Please submit upstream
  - We will do migration work for each CPU
- + Changes easy to implement
  - But will assist
- + Platform errata unaffected, but inaccessible to framework and ABI



#### LTS

+ Patches can be submitted to LTS

- + LTS identical to master, no changes required
  - For errata, at least



## Errata ABI convergence





#### Code

https://review.trustedfirmware.org/q/topic:%22bk%252Ferrata\_refactor%22+



