# Understanding Shared Memory Bank Access Interference in Multi-Core Avionics

<u>Andreas Löfwenmark</u>, Simin Nadjm-Tehrani Linköping University, Sweden 2016-07-05



### Motivation

- Future safety-critical avionic systems will use multicore
  - More complex systems => more computational capacity
  - Availability of single-core processors
- Demonstrating predictability not yet accomplished



# **Multi-Core Challenges**

- Achieving Temporal Partitioning
  - Multiple cores can access a shared resource simultaneously
- Worst-Case Execution Time (WCET) and Worst-Case Response Time (WCRT) analysis
  - Pessimism could negate the added processing capacity



#### **Shared Resources**





#### **Shared Resources**





#### **Shared Resources**

















• Activate (ACT)





- Activate (ACT)
- Read/Write (RD/WR)





- Activate (ACT)
- Read/Write (RD/WR)
- Precharge (PRE)





#### Contributions

- Single Core Equivalence (SCE) based WCRT estimation (Mancuso et al., 2015) combined with shared bank interference delay (Kim et al., 2014)
- Validation of WCRT estimates
  - Understanding (shared) bank interference
  - Comparison of estimates with measurements
- Adaptation of avionics RTOS



# SCE (Mancuso et al., ECRTS 2015)

- Colored Lockdown
  - Colors memory pages and locks them in cache
- MemGuard
  - Limits the number of memory accesses per core
- PALLOC
  - Allocates memory in specified DRAM banks for each core



# Adapting WCRT Analysis

- Why?
  - Private banks not always feasible
  - Not viable on our RTOS
- Change estimation equations to allow shared banks



### WCRT Estimation (Kim et al., RTAS 2014)

• Classical response time test is extended

$$R_i^{k+1} = C_i + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot C_j + H_i \cdot RD_p + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot H_j \cdot RD_p$$

 $H_i$ : Number of memory requests for task iRD<sub>p</sub>: Interference delay per request from core p



### WCRT Estimation (Kim et al., RTAS 2014)

• Classical response time test is extended

$$R_i^{k+1} = C_i + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot C_j + H_i \cdot RD_p + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot H_j \cdot RD_p$$

 $H_i$ : Number of memory requests for task iRD<sub>p</sub>: Interference delay per request from core p



### WCRT Estimation (Kim et al., RTAS 2014)

• Classical response time test is extended

$$R_i^{k+1} = C_i + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot C_j + H_i \cdot RD_p + \sum_{\tau_j \in hp(\tau_i)} \left[ \frac{R_i^k}{T_j} \right] \cdot H_j \cdot RD_p$$

 $H_i$ : Number of memory requests for task iRD<sub>p</sub>: Interference delay per request from core p



### Contributions

- Single Core Equivalence (SCE) based WCRT estimation (Mancuso et al., 2015) combined with shared bank interference delay (Kim et al., 2014)
- Validation of WCRT estimates
  - Understanding (shared) bank interference
  - Comparison of estimates with measurements
- Adaptation of avionics RTOS



#### **Interference Delay**

$$RD_p = RD_p^{\text{inter}} + RD_p^{\text{intra}}$$







#### **Interference Delay**

$$RD_{p} = RD_{p}^{\text{inter}} + RD_{p}^{\text{intra}}$$

$$RD_{p}^{\text{inter}} = \sum_{\substack{q \neq p \\ \text{shares no} \\ \text{bank with p}}} (L^{PRE} + L^{ACT} + L^{RW})$$



22

#### **Interference Delay**

 $RD_p = RD_p^{\text{inter}} + RD_p^{\text{intra}}$ 

$$RD_{p}^{\text{inter}} = \sum_{\substack{q \neq p \\ \text{shares no} \\ \text{bank with p}}} (L^{PRE} + L^{ACT} + L^{RW})$$



$$RD_p^{\text{intra}} = reorder(p) + \sum_{q \neq p} (L_{conf} + RD_q^{\text{inter}})$$

 $q \neq p$ shares bank with p







### Contributions

- Single Core Equivalence (SCE) based WCRT estimation (Mancuso et al., 2015) combined with shared bank interference delay (Kim et al., 2014)
- Validation of WCRT estimates
  - Understanding (shared) bank interference
  - Comparison of estimates with measurements
- Adaptation of avionics RTOS



# Validation of the Adapted Model

- Calculate WCRT
  - Estimate single-core WCET in isolation (C)
    - Measurement based
  - Estimate number of memory requests (H)
- Perform multi-core experiments to measure WCRT



#### **Estimations on Single-Core**

- Estimate single-core WCET in isolation (C)
- Count number of memory requests (H)

| Partition | WCET (C) (us) | Memory Requests (H)<br>(Partition) (RTOS) |     |
|-----------|---------------|-------------------------------------------|-----|
| Nav       | 14            | 93                                        | 54  |
| Mult      | 16615         | 21740                                     | 160 |
| Cubic     | 9345          | 45                                        | 38  |
| Image     | 4391          | 560                                       | 40  |



### Comparison of Calculated and Measured

- Calculate WCRT
- Measure WCRT

| Partition | Core | Period (us) | Response T<br>Estimated | ime (R) (us)<br>Measured |
|-----------|------|-------------|-------------------------|--------------------------|
| Nav       | 0    | 16667       | 45                      | 14                       |
| Mult      | 1    | 16667       | 21192                   | 16620                    |
| Cubic     | 2    | 16667       | 9362                    | 9345                     |
| Image     | 3    | 16667       | 4516                    | 4391                     |



# Focusing on processes with tight margins

- Methodology
  - Measure WCRT with a memory intensive synthetic application on core 1 3
  - With and without memory regulation

| Partition | Core | Period (us) | Response Time (R) (us)<br>No regulation Regulation |       |
|-----------|------|-------------|----------------------------------------------------|-------|
| Mult      | 0    | 16667       | 17075                                              | 16654 |

– "Right" memory access restriction here can also be used as bound when run with other applications!



### Conclusions

- Adaptation of SCE framework and avionics RTOS indeed support running critical avionic applications on multi-core
- Bound on interference delay with shared DRAM banks is a pessimistic upper bound as expected
- Insight: Worst-case interference need not arise in a scenario with maximum number of cores



Andreas Löfwenmark, Simin Nadjm-Tehrani Linköping University, Sweden

www.liu.se

