summaryrefslogtreecommitdiff
path: root/docs/firmware-design.md
diff options
context:
space:
mode:
authorAndrew Thoelke <andrew.thoelke@arm.com>2015-09-10 11:39:36 +0100
committerVikram Kanigiri <vikram.kanigiri@arm.com>2015-09-11 16:19:21 +0100
commitee7b35c4e1169ec68df016bfb244a5281440119d (patch)
tree542012cfa197c96d89a42bbd5e5e0b4a33897fe4 /docs/firmware-design.md
parent604d5da6f2aa837326d5eda7735c8dac9a127ccd (diff)
Re-design bakery lock memory allocation and algorithm
This patch unifies the bakery lock api's across coherent and normal memory implementation of locks by using same data type `bakery_lock_t` and similar arguments to functions. A separate section `bakery_lock` has been created and used to allocate memory for bakery locks using `DEFINE_BAKERY_LOCK`. When locks are allocated in normal memory, each lock for a core has to spread across multiple cache lines. By using the total size allocated in a separate cache line for a single core at compile time, the memory for other core locks is allocated at link time by multiplying the single core locks size with (PLATFORM_CORE_COUNT - 1). The normal memory lock algorithm now uses lock address instead of the `id` in the per_cpu_data. For locks allocated in coherent memory, it moves locks from tzfw_coherent_memory to bakery_lock section. The bakery locks are allocated as part of bss or in coherent memory depending on usage of coherent memory. Both these regions are initialised to zero as part of run_time_init before locks are used. Hence, bakery_lock_init() is made an empty function as the lock memory is already initialised to zero. The above design lead to the removal of psci bakery locks from non_cpu_power_pd_node to psci_locks. NOTE: THE BAKERY LOCK API WHEN USE_COHERENT_MEM IS NOT SET HAS CHANGED. THIS IS A BREAKING CHANGE FOR ALL PLATFORM PORTS THAT ALLOCATE BAKERY LOCKS IN NORMAL MEMORY. Change-Id: Ic3751c0066b8032dcbf9d88f1d4dc73d15f61d8b
Diffstat (limited to 'docs/firmware-design.md')
-rw-r--r--docs/firmware-design.md137
1 files changed, 77 insertions, 60 deletions
diff --git a/docs/firmware-design.md b/docs/firmware-design.md
index 18f634f4..41fb7c0d 100644
--- a/docs/firmware-design.md
+++ b/docs/firmware-design.md
@@ -1523,38 +1523,52 @@ approach described above.
The below sections analyze the data structures allocated in the coherent memory
region and the changes required to allocate them in normal memory.
-### PSCI Affinity map nodes
-
-The `psci_aff_map` data structure stores the hierarchial node information for
-each affinity level in the system including the PSCI states associated with them.
-By default, this data structure is allocated in the coherent memory region in
-the Trusted Firmware because it can be accessed by multiple CPUs, either with
-their caches enabled or disabled.
-
- typedef struct aff_map_node {
- unsigned long mpidr;
- unsigned char ref_count;
- unsigned char state;
- unsigned char level;
- #if USE_COHERENT_MEM
- bakery_lock_t lock;
- #else
- unsigned char aff_map_index;
- #endif
- } aff_map_node_t;
+### Coherent memory usage in PSCI implementation
+
+The `psci_non_cpu_pd_nodes` data structure stores the platform's power domain
+tree information for state management of power domains. By default, this data
+structure is allocated in the coherent memory region in the Trusted Firmware
+because it can be accessed by multple CPUs, either with caches enabled or
+disabled.
+
+typedef struct non_cpu_pwr_domain_node {
+ /*
+ * Index of the first CPU power domain node level 0 which has this node
+ * as its parent.
+ */
+ unsigned int cpu_start_idx;
+
+ /*
+ * Number of CPU power domains which are siblings of the domain indexed
+ * by 'cpu_start_idx' i.e. all the domains in the range 'cpu_start_idx
+ * -> cpu_start_idx + ncpus' have this node as their parent.
+ */
+ unsigned int ncpus;
+
+ /*
+ * Index of the parent power domain node.
+ * TODO: Figure out whether to whether using pointer is more efficient.
+ */
+ unsigned int parent_node;
+
+ plat_local_state_t local_state;
+
+ unsigned char level;
+
+ /* For indexing the psci_lock array*/
+ unsigned char lock_index;
+} non_cpu_pd_node_t;
In order to move this data structure to normal memory, the use of each of its
-fields must be analyzed. Fields like `mpidr` and `level` are only written once
-during cold boot. Hence removing them from coherent memory involves only doing
-a clean and invalidate of the cache lines after these fields are written.
-
-The fields `state` and `ref_count` can be concurrently accessed by multiple
-CPUs in different cache states. A Lamport's Bakery lock is used to ensure mutual
-exlusion to these fields. As a result, it is possible to move these fields out
-of coherent memory by performing software cache maintenance on them. The field
-`lock` is the bakery lock data structure when `USE_COHERENT_MEM` is enabled.
-The `aff_map_index` is used to identify the bakery lock when `USE_COHERENT_MEM`
-is disabled.
+fields must be analyzed. Fields like `cpu_start_idx`, `ncpus`, `parent_node`
+`level` and `lock_index` are only written once during cold boot. Hence removing
+them from coherent memory involves only doing a clean and invalidate of the
+cache lines after these fields are written.
+
+The field `local_state` can be concurrently accessed by multiple CPUs in
+different cache states. A Lamport's Bakery lock `psci_locks` is used to ensure
+mutual exlusion to this field and a clean and invalidate is needed after it
+is written.
### Bakery lock data
@@ -1563,9 +1577,13 @@ and is accessed by multiple CPUs with mismatched attributes. `bakery_lock_t` is
defined as follows:
typedef struct bakery_lock {
- int owner;
- volatile char entering[BAKERY_LOCK_MAX_CPUS];
- volatile unsigned number[BAKERY_LOCK_MAX_CPUS];
+ /*
+ * The lock_data is a bit-field of 2 members:
+ * Bit[0] : choosing. This field is set when the CPU is
+ * choosing its bakery number.
+ * Bits[1 - 15] : number. This is the bakery number allocated.
+ */
+ volatile uint16_t lock_data[BAKERY_LOCK_MAX_CPUS];
} bakery_lock_t;
It is a characteristic of Lamport's Bakery algorithm that the volatile per-CPU
@@ -1589,17 +1607,14 @@ the update made by CPU0 as well.
To use bakery locks when `USE_COHERENT_MEM` is disabled, the lock data structure
has been redesigned. The changes utilise the characteristic of Lamport's Bakery
-algorithm mentioned earlier. The per-CPU fields of the new lock structure are
-aligned such that they are allocated on separate cache lines. The per-CPU data
-framework in Trusted Firmware is used to achieve this. This enables software to
+algorithm mentioned earlier. The bakery_lock structure only allocates the memory
+for a single CPU. The macro `DEFINE_BAKERY_LOCK` allocates all the bakery locks
+needed for a CPU into a section `bakery_lock`. The linker allocates the memory
+for other cores by using the total size allocated for the bakery_lock section
+and multiplying it with (PLATFORM_CORE_COUNT - 1). This enables software to
perform software cache maintenance on the lock data structure without running
into coherency issues associated with mismatched attributes.
-The per-CPU data framework enables consolidation of data structures on the
-fewest cache lines possible. This saves memory as compared to the scenario where
-each data structure is separately aligned to the cache line boundary to achieve
-the same effect.
-
The bakery lock data structure `bakery_info_t` is defined for use when
`USE_COHERENT_MEM` is disabled as follows:
@@ -1615,12 +1630,10 @@ The bakery lock data structure `bakery_info_t` is defined for use when
The `bakery_info_t` represents a single per-CPU field of one lock and
the combination of corresponding `bakery_info_t` structures for all CPUs in the
-system represents the complete bakery lock. It is embedded in the per-CPU
-data framework `cpu_data` as shown below:
+system represents the complete bakery lock. The view in memory for a system
+with n bakery locks are:
- CPU0 cpu_data
- ------------------
- | .... |
+ bakery_lock section start
|----------------|
| `bakery_info_t`| <-- Lock_0 per-CPU field
| Lock_0 | for CPU0
@@ -1633,12 +1646,11 @@ data framework `cpu_data` as shown below:
| `bakery_info_t`| <-- Lock_N per-CPU field
| Lock_N | for CPU0
------------------
-
-
- CPU1 cpu_data
+ | XXXXX |
+ | Padding to |
+ | next Cache WB | <--- Calculate PERCPU_BAKERY_LOCK_SIZE, allocate
+ | Granule | continuous memory for remaining CPUs.
------------------
- | .... |
- |----------------|
| `bakery_info_t`| <-- Lock_0 per-CPU field
| Lock_0 | for CPU1
|----------------|
@@ -1650,14 +1662,20 @@ data framework `cpu_data` as shown below:
| `bakery_info_t`| <-- Lock_N per-CPU field
| Lock_N | for CPU1
------------------
+ | XXXXX |
+ | Padding to |
+ | next Cache WB |
+ | Granule |
+ ------------------
-Consider a system of 2 CPUs with 'N' bakery locks as shown above. For an
+Consider a system of 2 CPUs with 'N' bakery locks as shown above. For an
operation on Lock_N, the corresponding `bakery_info_t` in both CPU0 and CPU1
-`cpu_data` need to be fetched and appropriate cache operations need to be
-performed for each access.
+`bakery_lock` section need to be fetched and appropriate cache operations need
+to be performed for each access.
+
+On ARM Platforms, bakery locks are used in psci (`psci_locks`) and power controller
+driver (`arm_lock`).
-For multiple bakery locks, an array of `bakery_info_t` is declared in `cpu_data`
-and each lock is given an `id` to identify it in the array.
### Non Functional Impact of removing coherent memory
@@ -1680,10 +1698,9 @@ Juno ARM development platform.
As mentioned earlier, almost a page of memory can be saved by disabling
`USE_COHERENT_MEM`. Each platform needs to consider these trade-offs to decide
whether coherent memory should be used. If a platform disables
-`USE_COHERENT_MEM` and needs to use bakery locks in the porting layer, it should
-reserve memory in `cpu_data` by defining the macro `PLAT_PCPU_DATA_SIZE` (see
-the [Porting Guide]). Refer to the reference platform code for examples.
-
+`USE_COHERENT_MEM` and needs to use bakery locks in the porting layer, it can
+optionally define macro `PLAT_PERCPU_BAKERY_LOCK_SIZE` (see the [Porting
+Guide]). Refer to the reference platform code for examples.
12. Code Structure
-------------------