3 files changed, 380 insertions, 79 deletions
diff --git a/docs/firmware-design.md b/docs/firmware-design.md
index be1f5f30..4a50a7b7 100644
--- a/docs/firmware-design.md
+++ b/docs/firmware-design.md
@@ -9,11 +9,13 @@ Contents :
 4.  [Power State Coordination Interface](#4--power-state-coordination-interface)
 5.  [Secure-EL1 Payloads and Dispatchers](#5--secure-el1-payloads-and-dispatchers)
 6.  [Crash Reporting in BL3-1](#6--crash-reporting-in-bl3-1)
-7.  [CPU specific operations framework](#7--cpu-specific-operations-framework)
-8.  [Memory layout of BL images](#8-memory-layout-of-bl-images)
-9.  [Firmware Image Package (FIP)](#9--firmware-image-package-fip)
-10. [Code Structure](#10--code-structure)
-11. [References](#11--references)
+7.  [Guidelines for Reset Handlers](#7--guidelines-for-reset-handlers)
+8.  [CPU specific operations framework](#8--cpu-specific-operations-framework)
+9.  [Memory layout of BL images](#9-memory-layout-of-bl-images)
+10. [Firmware Image Package (FIP)](#10--firmware-image-package-fip)
+11. [Use of coherent memory in Trusted Firmware](#11--use-of-coherent-memory-in-trusted-firmware)
+12. [Code Structure](#12--code-structure)
+13. [References](#13--references)
 
 
 1.  Introduction
@@ -368,10 +370,10 @@ level implementation of the generic timer through the memory mapped interface.
     `ON`; any other cluster is `OFF`. BL3-1 initializes the data structures that
     implement the state machine, including the locks that protect them. BL3-1
     accesses the state of a CPU or cluster immediately after reset and before
-    the MMU is enabled in the warm boot path. It is not currently possible to
-    use 'exclusive' based spinlocks, therefore BL3-1 uses locks based on
-    Lamport's Bakery algorithm instead. BL3-1 allocates these locks in device
-    memory. They are accessible irrespective of MMU state.
+    the data cache is enabled in the warm boot path. It is not currently
+    possible to use 'exclusive' based spinlocks, therefore BL3-1 uses locks
+    based on Lamport's Bakery algorithm instead. BL3-1 allocates these locks in
+    device memory by default.
 
 *   Runtime services initialization:
 
@@ -733,32 +735,43 @@ restoring the stack and CPU state and returning from the original SMC.
 
 TODO: Provide design walkthrough of PSCI implementation.
 
-The complete PSCI API is not yet implemented. The following functions are
-currently implemented:
-
--   `PSCI_VERSION`
--   `CPU_OFF`
--   `CPU_ON`
--   `CPU_SUSPEND`
--   `AFFINITY_INFO`
--   `SYSTEM_OFF`
--   `SYSTEM_RESET`
-
-The `CPU_ON`, `CPU_OFF` and `CPU_SUSPEND` functions implement the warm boot
-path in ARM Trusted Firmware. `CPU_ON` and `CPU_OFF` have undergone testing
-on all the supported FVPs. `CPU_SUSPEND` & `AFFINITY_INFO` have undergone
-testing only on the AEM v8 Base FVP. Support for `AFFINITY_INFO` is still
-experimental. Support for `CPU_SUSPEND` is stable for entry into power down
-states. Standby states are currently not supported. `PSCI_VERSION` is
-present but completely untested in this version of the software.
-
-The following unsupported functions return with a error code as documented in
-the [Power State Coordination Interface PDD] [PSCI].
-
--   `MIGRATE` : -1 (NOT_SUPPORTED)
--   `MIGRATE_INFO_TYPE` : 2 (Trusted OS is either not present or does not
-     require migration)
--   `MIGRATE_INFO_UP_CPU` : 0 (Return value is UNDEFINED)
+The PSCI v1.0 specification categorizes APIs as optional and mandatory. All the
+mandatory APIs in PSCI v1.0 and all the APIs in PSCI v0.2 draft specification
+[Power State Coordination Interface PDD] [PSCI] are implemented. The table lists
+the PSCI v1.0 APIs and their support in generic code.
+
+An API implementation might have a dependency on platform code e.g. CPU_SUSPEND
+requires the platform to export a part of the implementation. Hence the level
+of support of the mandatory APIs depends upon the support exported by the
+platform port as well. The Juno and FVP (all variants) platforms export all the
+required support.
+
+| PSCI v1.0 API         |Supported| Comments                                  |
+|:----------------------|:--------|:------------------------------------------|
+|`PSCI_VERSION`         | Yes     | The version returned is 1.0               |
+|`CPU_SUSPEND`          | Yes*    | The original `power_state` format is used |
+|`CPU_OFF`              | Yes*    |                                           |
+|`CPU_ON`               | Yes*    |                                           |
+|`AFFINITY_INFO`        | Yes     |                                           |
+|`MIGRATE`              | Yes**   |                                           |
+|`MIGRATE_INFO_TYPE`    | Yes**   |                                           |
+|`MIGRATE_INFO_CPU`     | Yes**   |                                           |
+|`SYSTEM_OFF`           | Yes*    |                                           |
+|`SYSTEM_RESET`         | Yes*    |                                           |
+|`PSCI_FEATURES`        | Yes     |                                           |
+|`CPU_FREEZE`           | No      |                                           |
+|`CPU_DEFAULT_SUSPEND`  | No      |                                           |
+|`CPU_HW_STATE`         | No      |                                           |
+|`SYSTEM_SUSPEND`       | No      |                                           |
+|`PSCI_SET_SUSPEND_MODE`| No      |                                           |
+|`PSCI_STAT_RESIDENCY`  | No      |                                           |
+|`PSCI_STAT_COUNT`      | No      |                                           |
+
+*Note : These PSCI APIs require platform power management hooks to be
+registered with the generic PSCI code to be supported.
+
+**Note : These PSCI APIs require appropriate Secure Payload Dispatcher
+hooks to be registered with the generic PSCI code to be supported.
 
 
 5.  Secure-EL1 Payloads and Dispatchers
@@ -948,8 +961,48 @@ The sample crash output is shown below.
     fpexc32_el2	:0x0000000004000700
     sp_el0	:0x0000000004010780
 
+7.  Guidelines for Reset Handlers
+---------------------------------
+
+Trusted Firmware implements a framework that allows CPU and platform ports to
+perform actions immediately after a CPU is released from reset in both the cold
+and warm boot paths. This is done by calling the `reset_handler()` function in
+both the BL1 and BL3-1 images. It in turn calls the platform and CPU specific
+reset handling functions.
+
+Details for implementing a CPU specific reset handler can be found in
+Section 8. Details for implementing a platform specific reset handler can be
+found in the [Porting Guide](see the `plat_reset_handler()` function).
+
+When adding functionality to a reset handler, the following points should be
+kept in mind.
+
+1.   The first reset handler in the system exists either in a ROM image
+     (e.g. BL1), or BL3-1 if `RESET_TO_BL31` is true. This may be detected at
+     compile time using the constant `FIRST_RESET_HANDLER_CALL`.
+
+2.   When considering ROM images, it's important to consider non TF-based ROMs
+     and ROMs based on previous versions of the TF code.
 
-7.  CPU specific operations framework
+3.   If the functionality should be applied to a ROM and there is no possibility
+     of a ROM being used that does not apply the functionality (or equivalent),
+     then the functionality should be applied within a `#if
+     FIRST_RESET_HANDLER_CALL` block.
+
+4.   If the functionality should execute in BL3-1 in order to override or
+     supplement a ROM version of the functionality, then the functionality
+     should be applied in the `#else` part of a `#if FIRST_RESET_HANDLER_CALL`
+     block.
+
+5.   If the functionality should be applied to a ROM but there is a possibility
+     of ROMs being used that do not apply the functionality, then the
+     functionality should be applied outside of a `FIRST_RESET_HANDLER_CALL`
+     block, so that BL3-1 has an opportunity to apply the functionality instead.
+     In this case, additional code may be needed to cope with different ROMs
+     that do or do not apply the functionality.
+
+
+8.  CPU specific operations framework
 -----------------------------
 
 Certain aspects of the ARMv8 architecture are implementation defined,
@@ -1014,6 +1067,9 @@ in midr are used to find the matching `cpu_ops` entry. The `reset_func()` in
 the returned `cpu_ops` is then invoked which executes the required reset
 handling for that CPU and also any errata workarounds enabled by the platform.
 
+Refer to Section "Guidelines for Reset Handlers" for general guidelines
+regarding placement of code in a reset handler.
+
 ### CPU specific power down sequence
 
 During the BL3-1 initialization sequence, the pointer to the matching `cpu_ops`
@@ -1044,7 +1100,7 @@ be reported and a pointer to the ASCII list of register names in a format
 expected by the crash reporting framework.
 
 
-8. Memory layout of BL images
+9. Memory layout of BL images
 -----------------------------
 
 Each bootloader image can be divided in 2 parts:
@@ -1127,9 +1183,10 @@ this purpose:
 * `__BSS_START__` This address must be aligned on a 16-byte boundary.
 * `__BSS_SIZE__`
 
-Similarly, the coherent memory section must be zero-initialised. Also, the MMU
-setup code needs to know the extents of this section to set the right memory
-attributes for it. The following linker symbols are defined for this purpose:
+Similarly, the coherent memory section (if enabled) must be zero-initialised.
+Also, the MMU setup code needs to know the extents of this section to set the
+right memory attributes for it. The following linker symbols are defined for
+this purpose:
 
 * `__COHERENT_RAM_START__` This address must be aligned on a page-size boundary.
 * `__COHERENT_RAM_END__` This address must be aligned on a page-size boundary.
@@ -1399,7 +1456,7 @@ Loading the BL3-2 image in DRAM doesn't change the memory layout of the other
 images in Trusted SRAM.
 
 
-9.  Firmware Image Package (FIP)
+10.  Firmware Image Package (FIP)
 ---------------------------------
 
 Using a Firmware Image Package (FIP) allows for packing bootloader images (and
@@ -1477,7 +1534,208 @@ Currently the FVP's policy only allows loading of a known set of images. The
 platform policy can be modified to allow additional images.
 
 
-10.  Code Structure
+11. Use of coherent memory in Trusted Firmware
+----------------------------------------------
+
+There might be loss of coherency when physical memory with mismatched
+shareability, cacheability and memory attributes is accessed by multiple CPUs
+(refer to section B2.9 of [ARM ARM] for more details). This possibility occurs
+in Trusted Firmware during power up/down sequences when coherency, MMU and
+caches are turned on/off incrementally.
+
+Trusted Firmware defines coherent memory as a region of memory with Device
+nGnRE attributes in the translation tables. The translation granule size in
+Trusted Firmware is 4KB. This is the smallest possible size of the coherent
+memory region.
+
+By default, all data structures which are susceptible to accesses with
+mismatched attributes from various CPUs are allocated in a coherent memory
+region (refer to section 2.1 of [Porting Guide]). The coherent memory region
+accesses are Outer Shareable, non-cacheable and they can be accessed
+with the Device nGnRE attributes when the MMU is turned on. Hence, at the
+expense of at least an extra page of memory, Trusted Firmware is able to work
+around coherency issues due to mismatched memory attributes.
+
+The alternative to the above approach is to allocate the susceptible data
+structures in Normal WriteBack WriteAllocate Inner shareable memory. This
+approach requires the data structures to be designed so that it is possible to
+work around the issue of mismatched memory attributes by performing software
+cache maintenance on them.
+
+### Disabling the use of coherent memory in Trusted Firmware
+
+It might be desirable to avoid the cost of allocating coherent memory on
+platforms which are memory constrained. Trusted Firmware enables inclusion of
+coherent memory in firmware images through the build flag `USE_COHERENT_MEM`.
+This flag is enabled by default. It can be disabled to choose the second
+approach described above.
+
+The below sections analyze the data structures allocated in the coherent memory
+region and the changes required to allocate them in normal memory.
+
+### PSCI Affinity map nodes
+
+The `psci_aff_map` data structure stores the hierarchial node information for
+each affinity level in the system including the PSCI states associated with them.
+By default, this data structure is allocated in the coherent memory region in
+the Trusted Firmware because it can be accessed by multiple CPUs, either with
+their caches enabled or disabled.
+
+	typedef struct aff_map_node {
+		unsigned long mpidr;
+		unsigned char ref_count;
+		unsigned char state;
+		unsigned char level;
+	#if USE_COHERENT_MEM
+		bakery_lock_t lock;
+	#else
+		unsigned char aff_map_index;
+	#endif
+	} aff_map_node_t;
+
+In order to move this data structure to normal memory, the use of each of its
+fields must be analyzed. Fields like `mpidr` and `level` are only written once
+during cold boot. Hence removing them from coherent memory involves only doing
+a clean and invalidate of the cache lines after these fields are written.
+
+The fields `state` and `ref_count` can be concurrently accessed by multiple
+CPUs in different cache states. A Lamport's Bakery lock is used to ensure mutual
+exlusion to these fields. As a result, it is possible to move these fields out
+of coherent memory by performing software cache maintenance on them. The field
+`lock` is the bakery lock data structure when `USE_COHERENT_MEM` is enabled.
+The `aff_map_index` is used to identify the bakery lock when `USE_COHERENT_MEM`
+is disabled.
+
+### Bakery lock data
+
+The bakery lock data structure `bakery_lock_t` is allocated in coherent memory
+and is accessed by multiple CPUs with mismatched attributes. `bakery_lock_t` is
+defined as follows:
+
+    typedef struct bakery_lock {
+        int owner;
+        volatile char entering[BAKERY_LOCK_MAX_CPUS];
+        volatile unsigned number[BAKERY_LOCK_MAX_CPUS];
+    } bakery_lock_t;
+
+It is a characteristic of Lamport's Bakery algorithm that the volatile per-CPU
+fields can be read by all CPUs but only written to by the owning CPU.
+
+Depending upon the data cache line size, the per-CPU fields of the
+`bakery_lock_t` structure for multiple CPUs may exist on a single cache line.
+These per-CPU fields can be read and written during lock contention by multiple
+CPUs with mismatched memory attributes. Since these fields are a part of the
+lock implementation, they do not have access to any other locking primitive to
+safeguard against the resulting coherency issues. As a result, simple software
+cache maintenance is not enough to allocate them in coherent memory. Consider
+the following example.
+
+CPU0 updates its per-CPU field with data cache enabled. This write updates a
+local cache line which contains a copy of the fields for other CPUs as well. Now
+CPU1 updates its per-CPU field of the `bakery_lock_t` structure with data cache
+disabled. CPU1 then issues a DCIVAC operation to invalidate any stale copies of
+its field in any other cache line in the system. This operation will invalidate
+the update made by CPU0 as well.
+
+To use bakery locks when `USE_COHERENT_MEM` is disabled, the lock data structure
+has been redesigned. The changes utilise the characteristic of Lamport's Bakery
+algorithm mentioned earlier. The per-CPU fields of the new lock structure are
+aligned such that they are allocated on separate cache lines. The per-CPU data
+framework in Trusted Firmware is used to achieve this. This enables software to
+perform software cache maintenance on the lock data structure without running
+into coherency issues associated with mismatched attributes.
+
+The per-CPU data framework enables consolidation of data structures on the
+fewest cache lines possible. This saves memory as compared to the scenario where
+each data structure is separately aligned to the cache line boundary to achieve
+the same effect.
+
+The bakery lock data structure `bakery_info_t` is defined for use when
+`USE_COHERENT_MEM` is disabled as follows:
+
+    typedef struct bakery_info {
+        /*
+         * The lock_data is a bit-field of 2 members:
+         * Bit[0]       : choosing. This field is set when the CPU is
+         *                choosing its bakery number.
+         * Bits[1 - 15] : number. This is the bakery number allocated.
+         */
+         volatile uint16_t lock_data;
+    } bakery_info_t;
+
+The `bakery_info_t` represents a single per-CPU field of one lock and
+the combination of corresponding `bakery_info_t` structures for all CPUs in the
+system represents the complete bakery lock. It is embedded in the per-CPU
+data framework `cpu_data` as shown below:
+
+      CPU0 cpu_data
+    ------------------
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_0 per-CPU field
+    |    Lock_0      |     for CPU0
+    |----------------|
+    | `bakery_info_t`| <-- Lock_1 per-CPU field
+    |    Lock_1      |     for CPU0
+    |----------------|
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_N per-CPU field
+    |    Lock_N      |     for CPU0
+    ------------------
+
+
+      CPU1 cpu_data
+    ------------------
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_0 per-CPU field
+    |    Lock_0      |     for CPU1
+    |----------------|
+    | `bakery_info_t`| <-- Lock_1 per-CPU field
+    |    Lock_1      |     for CPU1
+    |----------------|
+    | ....           |
+    |----------------|
+    | `bakery_info_t`| <-- Lock_N per-CPU field
+    |    Lock_N      |     for CPU1
+    ------------------
+
+Consider a system of 2 CPUs with 'N' bakery locks as shown above.  For an
+operation on Lock_N, the corresponding `bakery_info_t` in both CPU0 and CPU1
+`cpu_data` need to be fetched and appropriate cache operations need to be
+performed for each access.
+
+For multiple bakery locks, an array of `bakery_info_t` is declared in `cpu_data`
+and each lock is given an `id` to identify it in the array.
+
+### Non Functional Impact of removing coherent memory
+
+Removal of the coherent memory region leads to the additional software overhead
+of performing cache maintenance for the affected data structures. However, since
+the memory where the data structures are allocated is cacheable, the overhead is
+mostly mitigated by an increase in performance.
+
+There is however a performance impact for bakery locks, due to:
+*   Additional cache maintenance operations, and
+*   Multiple cache line reads for each lock operation, since the bakery locks
+    for each CPU are distributed across different cache lines.
+
+The implementation has been optimized to mimimize this additional overhead.
+Measurements indicate that when bakery locks are allocated in Normal memory, the
+minimum latency of acquiring a lock is on an average 3-4 micro seconds whereas
+in Device memory the same is 2 micro seconds. The measurements were done on the
+Juno ARM development platform.
+
+As mentioned earlier, almost a page of memory can be saved by disabling
+`USE_COHERENT_MEM`. Each platform needs to consider these trade-offs to decide
+whether coherent memory should be used. If a platform disables
+`USE_COHERENT_MEM` and needs to use bakery locks in the porting layer, it should
+reserve memory in `cpu_data` by defining the macro `PLAT_PCPU_DATA_SIZE` (see
+the [Porting Guide]). Refer to the reference platform code for examples.
+
+
+12.  Code Structure
 -------------------
 
 Trusted Firmware code is logically divided between the three boot loader
@@ -1522,7 +1780,7 @@ FDTs provide a description of the hardware platform and are used by the Linux
 kernel at boot time. These can be found in the `fdts` directory.
 
 
-11.  References
+13.  References
 ---------------
 
 1.  Trusted Board Boot Requirements CLIENT PDD (ARM DEN 0006B-5). Available
@@ -1538,7 +1796,7 @@ kernel at boot time. These can be found in the `fdts` directory.
 
 _Copyright (c) 2013-2014, ARM Limited and Contributors. All rights reserved._
 
-
+[ARM ARM]:          http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.e/index.html "ARMv8-A Reference Manual (ARM DDI0487A.E)"
 [PSCI]:             http://infocenter.arm.com/help/topic/com.arm.doc.den0022b/index.html "Power State Coordination Interface PDD (ARM DEN 0022B.b)"
 [SMCCC]:            http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html "SMC Calling Convention PDD (ARM DEN 0028A)"
 [UUID]:             https://tools.ietf.org/rfc/rfc4122.txt "A Universally Unique IDentifier (UUID) URN Namespace"
diff --git a/docs/porting-guide.md b/docs/porting-guide.md
index 3855ca7b..747cb005 100644
--- a/docs/porting-guide.md
+++ b/docs/porting-guide.md
@@ -63,11 +63,11 @@ mapped page tables, and enable both the instruction and data caches for each BL
 stage. In the ARM FVP port, each BL stage configures the MMU in its platform-
 specific architecture setup function, for example `blX_plat_arch_setup()`.
 
-Each platform must allocate a block of identity mapped secure memory with
-Device-nGnRE attributes aligned to page boundary (4K) for each BL stage. This
-memory is identified by the section name `tzfw_coherent_mem` so that its
-possible for the firmware to place variables in it using the following C code
-directive:
+If the build option `USE_COHERENT_MEM` is enabled, each platform must allocate a
+block of identity mapped secure memory with Device-nGnRE attributes aligned to
+page boundary (4K) for each BL stage. This memory is identified by the section
+name `tzfw_coherent_mem` so that its possible for the firmware to place
+variables in it using the following C code directive:
 
     __attribute__ ((section("tzfw_coherent_mem")))
 
@@ -246,6 +246,17 @@ must also be defined:
     entities than this value using `io_open()` will fail with
     IO_RESOURCES_EXHAUSTED.
 
+If the platform needs to allocate data within the per-cpu data framework in
+BL3-1, it should define the following macro. Currently this is only required if
+the platform decides not to use the coherent memory section by undefining the
+USE_COHERENT_MEM build flag. In this case, the framework allocates the required
+memory within the the per-cpu data to minimize wastage.
+
+*   **#define : PLAT_PCPU_DATA_SIZE**
+
+    Defines the memory (in bytes) to be reserved within the per-cpu data
+    structure for use by the platform layer.
+
 The following constants are optional. They should be defined when the platform
 memory layout implies some image overlaying like on FVP.
 
@@ -472,7 +483,9 @@ specific errata workarounds could also be implemented here. The api should
 preserve the value in x10 register as it is used by the caller to store the
 return address.
 
-The default implementation doesn't do anything.
+The default implementation doesn't do anything. If a platform needs to override
+the default implementation, refer to the [Firmware Design Guide] for general
+guidelines regarding placement of code in a reset handler.
 
 ### Function : plat_disable_acp()
 
@@ -1083,61 +1096,61 @@ the passed pointer with a pointer to BL3-1's private `plat_pm_ops` structure.
 
 A description of each member of this structure is given below. Please refer to
 the ARM FVP specific implementation of these handlers in [plat/fvp/fvp_pm.c]
-as an example. A platform port may choose not implement some of the power
-management operations.
+as an example. A platform port is expected to implement these handlers if the
+corresponding PSCI operation is to be supported and these handlers are expected
+to succeed if the return type is `void`.
 
 #### plat_pm_ops.affinst_standby()
 
 Perform the platform-specific setup to enter the standby state indicated by the
-passed argument.
+passed argument. The generic code expects the handler to succeed.
 
 #### plat_pm_ops.affinst_on()
 
 Perform the platform specific setup to power on an affinity instance, specified
-by the `MPIDR` (first argument) and `affinity level` (fourth argument). The
-`state` (fifth argument) contains the current state of that affinity instance
+by the `MPIDR` (first argument) and `affinity level` (third argument). The
+`state` (fourth argument) contains the current state of that affinity instance
 (ON or OFF). This is useful to determine whether any action must be taken. For
 example, while powering on a CPU, the cluster that contains this CPU might
 already be in the ON state. The platform decides what actions must be taken to
 transition from the current state to the target state (indicated by the power
-management operation).
+management operation). The generic code expects the platform to return
+E_SUCCESS on success or E_INTERN_FAIL for any failure.
 
 #### plat_pm_ops.affinst_off()
 
-Perform the platform specific setup to power off an affinity instance in the
-`MPIDR` of the calling CPU. It is called by the PSCI `CPU_OFF` API
-implementation.
+Perform the platform specific setup to power off an affinity instance of the
+calling CPU. It is called by the PSCI `CPU_OFF` API implementation.
 
-The `MPIDR` (first argument), `affinity level` (second argument) and `state`
-(third argument) have a similar meaning as described in the `affinst_on()`
-operation. They are used to identify the affinity instance on which the call
-is made and its current state. This gives the platform port an indication of the
+The `affinity level` (first argument) and `state` (second argument) have
+a similar meaning as described in the `affinst_on()` operation. They are
+used to identify the affinity instance on which the call is made and its
+current state. This gives the platform port an indication of the
 state transition it must make to perform the requested action. For example, if
 the calling CPU is the last powered on CPU in the cluster, after powering down
 affinity level 0 (CPU), the platform port should power down affinity level 1
-(the cluster) as well.
+(the cluster) as well. The generic code expects the handler to succeed.
 
 #### plat_pm_ops.affinst_suspend()
 
-Perform the platform specific setup to power off an affinity instance in the
-`MPIDR` of the calling CPU. It is called by the PSCI `CPU_SUSPEND` API
+Perform the platform specific setup to power off an affinity instance of the
+calling CPU. It is called by the PSCI `CPU_SUSPEND` API
 implementation.
 
-The `MPIDR` (first argument), `affinity level` (third argument) and `state`
-(fifth argument) have a similar meaning as described in the `affinst_on()`
-operation. They are used to identify the affinity instance on which the call
-is made and its current state. This gives the platform port an indication of the
-state transition it must make to perform the requested action. For example, if
-the calling CPU is the last powered on CPU in the cluster, after powering down
-affinity level 0 (CPU), the platform port should power down affinity level 1
-(the cluster) as well.
+The `affinity level` (second argument) and `state` (third argument) have a
+similar meaning as described in the `affinst_on()` operation. They are used to
+identify the affinity instance on which the call is made and its current state.
+This gives the platform port an indication of the state transition it must
+make to perform the requested action. For example, if the calling CPU is the
+last powered on CPU in the cluster, after powering down affinity level 0 (CPU),
+the platform port should power down affinity level 1 (the cluster) as well.
 
 The difference between turning an affinity instance off versus suspending it
 is that in the former case, the affinity instance is expected to re-initialize
 its state when its next powered on (see `affinst_on_finish()`). In the latter
 case, the affinity instance is expected to save enough state so that it can
 resume execution by restoring this state when its powered on (see
-`affinst_suspend_finish()`).
+`affinst_suspend_finish()`).The generic code expects the handler to succeed.
 
 #### plat_pm_ops.affinst_on_finish()
 
@@ -1147,8 +1160,9 @@ It performs the platform-specific setup required to initialize enough state for
 this CPU to enter the normal world and also provide secure runtime firmware
 services.
 
-The `MPIDR` (first argument), `affinity level` (second argument) and `state`
-(third argument) have a similar meaning as described in the previous operations.
+The `affinity level` (first argument) and `state` (second argument) have a
+similar meaning as described in the previous operations. The generic code
+expects the handler to succeed.
 
 #### plat_pm_ops.affinst_on_suspend()
 
@@ -1159,8 +1173,25 @@ event, for example a timer interrupt that was programmed by the CPU during the
 restore the saved state for this CPU to resume execution in the normal world
 and also provide secure runtime firmware services.
 
-The `MPIDR` (first argument), `affinity level` (second argument) and `state`
-(third argument) have a similar meaning as described in the previous operations.
+The `affinity level` (first argument) and `state` (second argument) have a
+similar meaning as described in the previous operations. The generic code
+expects the platform to succeed.
+
+#### plat_pm_ops.validate_power_state()
+
+This function is called by the PSCI implementation during the `CPU_SUSPEND`
+call to validate the `power_state` parameter of the PSCI API. If the
+`power_state` is known to be invalid, the platform must return
+PSCI_E_INVALID_PARAMS as error, which is propagated back to the normal
+world PSCI client.
+
+#### plat_pm_ops.validate_ns_entrypoint()
+
+This function is called by the PSCI implementation during the `CPU_SUSPEND`
+and `CPU_ON` calls to validate the non-secure `entry_point` parameter passed
+by the normal world. If the `entry_point` is known to be invalid, the platform
+must return PSCI_E_INVALID_PARAMS as error, which is propagated back to the
+normal world PSCI client.
 
 BL3-1 platform initialization code must also detect the system topology and
 the state of each affinity instance in the topology. This information is
@@ -1447,6 +1478,7 @@ _Copyright (c) 2013-2014, ARM Limited and Contributors. All rights reserved._
 [IMF Design Guide]:                   interrupt-framework-design.md
 [User Guide]:                         user-guide.md
 [FreeBSD]:                            http://www.freebsd.org
+[Firmware Design Guide]:              firmware-design.md
 
 [plat/common/aarch64/platform_mp_stack.S]: ../plat/common/aarch64/platform_mp_stack.S
 [plat/common/aarch64/platform_up_stack.S]: ../plat/common/aarch64/platform_up_stack.S
diff --git a/docs/user-guide.md b/docs/user-guide.md
index f5a79e4f..4209be7e 100644
--- a/docs/user-guide.md
+++ b/docs/user-guide.md
@@ -245,6 +245,17 @@ performed.
     synchronous method) or 1 (BL3-2 is initialized using asynchronous method).
     Default is 0.
 
+*   `USE_COHERENT_MEM`: This flag determines whether to include the coherent
+    memory region in the BL memory map or not (see "Use of Coherent memory in
+    Trusted Firmware" section in [Firmware Design]). It can take the value 1
+    (Coherent memory region is included) or 0 (Coherent memory region is
+    excluded). Default is 1.
+
+*   `TSPD_ROUTE_IRQ_TO_EL3`: A non zero value enables the routing model
+    for non-secure interrupts in which they are routed to EL3 (TSPD). The
+    default model (when the value is 0) is to route non-secure interrupts
+    to S-EL1 (TSP).
+
 #### FVP specific build options
 
 *   `FVP_TSP_RAM_LOCATION`: location of the TSP binary. Options: