summaryrefslogtreecommitdiff
path: root/Documentation/driver-api/cxl/linux
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/driver-api/cxl/linux')
-rw-r--r--Documentation/driver-api/cxl/linux/access-coordinates.rst178
-rw-r--r--Documentation/driver-api/cxl/linux/cxl-driver.rst630
-rw-r--r--Documentation/driver-api/cxl/linux/dax-driver.rst43
-rw-r--r--Documentation/driver-api/cxl/linux/early-boot.rst137
-rw-r--r--Documentation/driver-api/cxl/linux/example-configurations/hb-interleave.rst314
-rw-r--r--Documentation/driver-api/cxl/linux/example-configurations/intra-hb-interleave.rst291
-rw-r--r--Documentation/driver-api/cxl/linux/example-configurations/multi-interleave.rst401
-rw-r--r--Documentation/driver-api/cxl/linux/example-configurations/single-device.rst246
-rw-r--r--Documentation/driver-api/cxl/linux/memory-hotplug.rst78
-rw-r--r--Documentation/driver-api/cxl/linux/overview.rst103
10 files changed, 2421 insertions, 0 deletions
diff --git a/Documentation/driver-api/cxl/linux/access-coordinates.rst b/Documentation/driver-api/cxl/linux/access-coordinates.rst
new file mode 100644
index 000000000000..341a7c682043
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/access-coordinates.rst
@@ -0,0 +1,178 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
+==================================
+CXL Access Coordinates Computation
+==================================
+
+Latency and Bandwidth Calculation
+=================================
+A memory region performance coordinates (latency and bandwidth) are typically
+provided via ACPI tables :doc:`SRAT <../platform/acpi/srat>` and
+:doc:`HMAT <../platform/acpi/hmat>`. However, the platform firmware (BIOS) is
+not able to annotate those for CXL devices that are hot-plugged since they do
+not exist during platform firmware initialization. The CXL driver can compute
+the performance coordinates by retrieving data from several components.
+
+The :doc:`SRAT <../platform/acpi/srat>` provides a Generic Port Affinity
+subtable that ties a proximity domain to a device handle, which in this case
+would be the CXL hostbridge. Using this association, the performance
+coordinates for the Generic Port can be retrieved from the
+:doc:`HMAT <../platform/acpi/hmat>` subtable. This piece represents the
+performance coordinates between a CPU and a Generic Port (CXL hostbridge).
+
+The :doc:`CDAT <../platform/cdat>` provides the performance coordinates for
+the CXL device itself. That is the bandwidth and latency to access that device's
+memory region. The DSMAS subtable provides a DSMADHandle that is tied to a
+Device Physical Address (DPA) range. The DSLBIS subtable provides the
+performance coordinates that's tied to a DSMADhandle and this ties the two
+table entries together to provide the performance coordinates for each DPA
+region. For example, if a device exports a DRAM region and a PMEM region,
+then there would be different performance characteristsics for each of those
+regions.
+
+If there's a CXL switch in the topology, then the performance coordinates for the
+switch is provided by SSLBIS subtable. This provides the bandwidth and latency
+for traversing the switch between the switch upstream port and the switch
+downstream port that points to the endpoint device.
+
+Simple topology example::
+
+ GP0/HB0/ACPI0016-0
+ RP0
+ |
+ | L0
+ |
+ SW 0 / USP0
+ SW 0 / DSP0
+ |
+ | L1
+ |
+ EP0
+
+In this example, there is a CXL switch between an endpoint and a root port.
+Latency in this example is calculated as such:
+L(EP0) - Latency from EP0 CDAT DSMAS+DSLBIS
+L(L1) - Link latency between EP0 and SW0DSP0
+L(SW0) - Latency for the switch from SW0 CDAT SSLBIS.
+L(L0) - Link latency between SW0 and RP0
+L(RP0) - Latency from root port to CPU via SRAT and HMAT (Generic Port).
+Total read and write latencies are the sum of all these parts.
+
+Bandwidth in this example is calculated as such:
+B(EP0) - Bandwidth from EP0 CDAT DSMAS+DSLBIS
+B(L1) - Link bandwidth between EP0 and SW0DSP0
+B(SW0) - Bandwidth for the switch from SW0 CDAT SSLBIS.
+B(L0) - Link bandwidth between SW0 and RP0
+B(RP0) - Bandwidth from root port to CPU via SRAT and HMAT (Generic Port).
+The total read and write bandwidth is the min() of all these parts.
+
+To calculate the link bandwidth:
+LinkOperatingFrequency (GT/s) is the current negotiated link speed.
+DataRatePerLink (MB/s) = LinkOperatingFrequency / 8
+Bandwidth (MB/s) = PCIeCurrentLinkWidth * DataRatePerLink
+Where PCIeCurrentLinkWidth is the number of lanes in the link.
+
+To calculate the link latency:
+LinkLatency (picoseconds) = FlitSize / LinkBandwidth (MB/s)
+
+See `CXL Memory Device SW Guide r1.0 <https://www.intel.com/content/www/us/en/content-details/643805/cxl-memory-device-software-guide.html>`_,
+section 2.11.3 and 2.11.4 for details.
+
+In the end, the access coordinates for a constructed memory region is calculated from one
+or more memory partitions from each of the CXL device(s).
+
+Shared Upstream Link Calculation
+================================
+For certain CXL region construction with endpoints behind CXL switches (SW) or
+Root Ports (RP), there is the possibility of the total bandwidth for all
+the endpoints behind a switch being more than the switch upstream link.
+A similar situation can occur within the host, upstream of the root ports.
+The CXL driver performs an additional pass after all the targets have
+arrived for a region in order to recalculate the bandwidths with possible
+upstream link being a limiting factor in mind.
+
+The algorithm assumes the configuration is a symmetric topology as that
+maximizes performance. When asymmetric topology is detected, the calculation
+is aborted. An asymmetric topology is detected during topology walk where the
+number of RPs detected as a grandparent is not equal to the number of devices
+iterated in the same iteration loop. The assumption is made that subtle
+asymmetry in properties does not happen and all paths to EPs are equal.
+
+There can be multiple switches under an RP. There can be multiple RPs under
+a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory
+Window Structure (CFMWS) in the :doc:`CEDT <../platform/acpi/cedt>`.
+
+An example hierarchy::
+
+ CFMWS 0
+ |
+ _________|_________
+ | |
+ ACPI0017-0 ACPI0017-1
+ GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1
+ | | | |
+ RP0 RP1 RP2 RP3
+ | | | |
+ SW 0 SW 1 SW 2 SW 3
+ | | | | | | | |
+ EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7
+
+Computation for the example hierarchy:
+
+Min (GP0 to CPU BW,
+ Min(SW 0 Upstream Link to RP0 BW,
+ Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) +
+ Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) +
+ Min(SW 1 Upstream Link to RP1 BW,
+ Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) +
+ Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) +
+Min (GP1 to CPU BW,
+ Min(SW 2 Upstream Link to RP2 BW,
+ Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) +
+ Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) +
+ Min(SW 3 Upstream Link to RP3 BW,
+ Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) +
+ Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link))))
+
+The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray
+is created to collect all the endpoint bandwidths via the
+cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the
+endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint
+has a CXL switch as a parent, then min() of calculated bandwidth and the
+bandwidth from the SSLBIS for the switch downstream port that is associated
+with the endpoint is calculated. The final bandwidth is stored in a
+'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the
+endpoint is direct attached to a root port (RP), the device pointer would be an
+RP device. If the endpoint is behind a switch, the device pointer would be the
+upstream device of the parent switch.
+
+At the next stage, the code walks through one or more switches if they exist
+in the topology. For endpoints directly attached to RPs, this step is skipped.
+If there is another switch upstream, the code takes the min() of the current
+gathered bandwidth and the upstream link bandwidth. If there's a switch
+upstream, then the SSLBIS of the upstream switch.
+
+Once the topology walk reaches the RP, whether it's direct attached endpoints
+or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At
+this point all the bandwidths are aggregated per each host bridge, which is
+also the index for the resulting xarray.
+
+The next step is to take the min() of the per host bridge bandwidth and the
+bandwidth from the Generic Port (GP). The bandwidths for the GP are retrieved
+via ACPI tables (:doc:`SRAT <../platform/acpi/srat>` and
+:doc:`HMAT <../platform/acpi/hmat>`). The minimum bandwidth are aggregated
+under the same ACPI0017 device to form a new xarray.
+
+Finally, the cxl_region_update_bandwidth() is called and the aggregated
+bandwidth from all the members of the last xarray is updated for the
+access coordinates residing in the cxl region (cxlr) context.
+
+QTG ID
+======
+Each :doc:`CEDT <../platform/acpi/cedt>` has a QTG ID field. This field provides
+the ID that associates with a QoS Throttling Group (QTG) for the CFMWS window.
+Once the access coordinates are calculated, an ACPI Device Specific Method can
+be issued to the ACPI0016 device to retrieve the QTG ID depends on the access
+coordinates provided. The QTG ID for the device can be used as guidance to match
+to the CFMWS to setup the best Linux root decoder for the device performance.
diff --git a/Documentation/driver-api/cxl/linux/cxl-driver.rst b/Documentation/driver-api/cxl/linux/cxl-driver.rst
new file mode 100644
index 000000000000..9759e90c3cf1
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/cxl-driver.rst
@@ -0,0 +1,630 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+CXL Driver Operation
+====================
+
+The devices described in this section are present in ::
+
+ /sys/bus/cxl/devices/
+ /dev/cxl/
+
+The :code:`cxl-cli` library, maintained as part of the NDTCL project, may
+be used to script interactions with these devices.
+
+Drivers
+=======
+The CXL driver is split into a number of drivers.
+
+* cxl_core - fundamental init interface and core object creation
+* cxl_port - initializes root and provides port enumeration interface.
+* cxl_acpi - initializes root decoders and interacts with ACPI data.
+* cxl_p/mem - initializes memory devices
+* cxl_pci - uses cxl_port to enumates the actual fabric hierarchy.
+
+Driver Devices
+==============
+Here is an example from a single-socket system with 4 host bridges. Two host
+bridges have a single memory device attached, and the devices are interleaved
+into a single memory region. The memory region has been converted to dax. ::
+
+ # ls /sys/bus/cxl/devices/
+ dax_region0 decoder3.0 decoder6.0 mem0 port3
+ decoder0.0 decoder4.0 decoder6.1 mem1 port4
+ decoder1.0 decoder5.0 endpoint5 port1 region0
+ decoder2.0 decoder5.1 endpoint6 port2 root0
+
+
+.. kernel-render:: DOT
+ :alt: Digraph of CXL fabric describing host-bridge interleaving
+ :caption: Diagraph of CXL fabric with a host-bridge interleave memory region
+
+ digraph foo {
+ "root0" -> "port1";
+ "root0" -> "port3";
+ "root0" -> "decoder0.0";
+ "port1" -> "endpoint5";
+ "port3" -> "endpoint6";
+ "port1" -> "decoder1.0";
+ "port3" -> "decoder3.0";
+ "endpoint5" -> "decoder5.0";
+ "endpoint6" -> "decoder6.0";
+ "decoder0.0" -> "region0";
+ "decoder0.0" -> "decoder1.0";
+ "decoder0.0" -> "decoder3.0";
+ "decoder1.0" -> "decoder5.0";
+ "decoder3.0" -> "decoder6.0";
+ "decoder5.0" -> "region0";
+ "decoder6.0" -> "region0";
+ "region0" -> "dax_region0";
+ "dax_region0" -> "dax0.0";
+ }
+
+For this section we'll explore the devices present in this configuration, but
+we'll explore more configurations in-depth in example configurations below.
+
+Base Devices
+------------
+Most devices in a CXL fabric are a `port` of some kind (because each
+device mostly routes request from one device to the next, rather than
+provide a direct service).
+
+Root
+~~~~
+The `CXL Root` is logical object created by the `cxl_acpi` driver during
+:code:`cxl_acpi_probe` - if the :code:`ACPI0017` `Compute Express Link
+Root Object` Device Class is found.
+
+The Root contains links to:
+
+* `Host Bridge Ports` defined by CHBS in the :doc:`CEDT<../platform/acpi/cedt>`
+
+* `Downstream Ports` typically connected to `Host Bridge Ports`.
+
+* `Root Decoders` defined by CFMWS the :doc:`CEDT<../platform/acpi/cedt>`
+
+::
+
+ # ls /sys/bus/cxl/devices/root0
+ decoder0.0 dport0 dport5 port2 subsystem
+ decoders_committed dport1 modalias port3 uevent
+ devtype dport4 port1 port4 uport
+
+ # cat /sys/bus/cxl/devices/root0/devtype
+ cxl_port
+
+ # cat port1/devtype
+ cxl_port
+
+ # cat decoder0.0/devtype
+ cxl_decoder_root
+
+The root is first `logical port` in the CXL fabric, as presented by the Linux
+CXL driver. The `CXL root` is a special type of `switch port`, in that it
+only has downstream port connections.
+
+Port
+~~~~
+A `port` object is better described as a `switch port`. It may represent a
+host bridge to the root or an actual switch port on a switch. A `switch port`
+contains one or more decoders used to route memory requests downstream ports,
+which may be connected to another `switch port` or an `endpoint port`.
+
+::
+
+ # ls /sys/bus/cxl/devices/port1
+ decoder1.0 dport0 driver parent_dport uport
+ decoders_committed dport113 endpoint5 subsystem
+ devtype dport2 modalias uevent
+
+ # cat devtype
+ cxl_port
+
+ # cat decoder1.0/devtype
+ cxl_decoder_switch
+
+ # cat endpoint5/devtype
+ cxl_port
+
+CXL `Host Bridges` in the fabric are probed during :code:`cxl_acpi_probe` at
+the time the `CXL Root` is probed. The allows for the immediate logical
+connection to between the root and host bridge.
+
+* The root has a downstream port connection to a host bridge
+
+* The host bridge has an upstream port connection to the root.
+
+* The host bridge has one or more downstream port connections to switch
+ or endpoint ports.
+
+A `Host Bridge` is a special type of CXL `switch port`. It is explicitly
+defined in the ACPI specification via `ACPI0016` ID. `Host Bridge` ports
+will be probed at `acpi_probe` time, while similar ports on an actual switch
+will be probed later. Otherwise, switch and host bridge ports look very
+similar - the both contain switch decoders which route accesses between
+upstream and downstream ports.
+
+Endpoint
+~~~~~~~~
+An `endpoint` is a terminal port in the fabric. This is a `logical device`,
+and may be one of many `logical devices` presented by a memory device. It
+is still considered a type of `port` in the fabric.
+
+An `endpoint` contains `endpoint decoders` and the device's Coherent Device
+Attribute Table (which describes the device's capabilities). ::
+
+ # ls /sys/bus/cxl/devices/endpoint5
+ CDAT decoders_committed modalias uevent
+ decoder5.0 devtype parent_dport uport
+ decoder5.1 driver subsystem
+
+ # cat /sys/bus/cxl/devices/endpoint5/devtype
+ cxl_port
+
+ # cat /sys/bus/cxl/devices/endpoint5/decoder5.0/devtype
+ cxl_decoder_endpoint
+
+
+Memory Device (memdev)
+~~~~~~~~~~~~~~~~~~~~~~
+A `memdev` is probed and added by the `cxl_pci` driver in :code:`cxl_pci_probe`
+and is managed by the `cxl_mem` driver. It primarily provides the `IOCTL`
+interface to a memory device, via :code:`/dev/cxl/memN`, and exposes various
+device configuration data. ::
+
+ # ls /sys/bus/cxl/devices/mem0
+ dev firmware_version payload_max security uevent
+ driver label_storage_size pmem serial
+ firmware numa_node ram subsystem
+
+A Memory Device is a discrete base object that is not a port. While the
+physical device it belongs to may also host an `endpoint`, the relationship
+between an `endpoint` and a `memdev` is not captured in sysfs.
+
+Port Relationships
+~~~~~~~~~~~~~~~~~~
+In our example described above, there are four host bridges attached to the
+root, and two of the host bridges have one endpoint attached.
+
+.. kernel-render:: DOT
+ :alt: Digraph of CXL fabric describing host-bridge interleaving
+ :caption: Diagraph of CXL fabric with a host-bridge interleave memory region
+
+ digraph foo {
+ "root0" -> "port1";
+ "root0" -> "port2";
+ "root0" -> "port3";
+ "root0" -> "port4";
+ "port1" -> "endpoint5";
+ "port3" -> "endpoint6";
+ }
+
+Decoders
+--------
+A `Decoder` is short for a CXL Host-Managed Device Memory (HDM) Decoder. It is
+a device that routes accesses through the CXL fabric to an endpoint, and at
+the endpoint translates a `Host Physical` to `Device Physical` Addressing.
+
+The CXL 3.1 specification heavily implies that only endpoint decoders should
+engage in translation of `Host Physical Address` to `Device Physical Address`.
+::
+
+ 8.2.4.20 CXL HDM Decoder Capability Structure
+
+ IMPLEMENTATION NOTE
+ CXL Host Bridge and Upstream Switch Port Decode Flow
+
+ IMPLEMENTATION NOTE
+ Device Decode Logic
+
+These notes imply that there are two logical groups of decoders.
+
+* Routing Decoder - a decoder which routes accesses but does not translate
+ addresses from HPA to DPA.
+
+* Translating Decoder - a decoder which translates accesses from HPA to DPA
+ for an endpoint to service.
+
+The CXL drivers distinguish 3 decoder types: root, switch, and endpoint. Only
+endpoint decoders are Translating Decoders, all others are Routing Decoders.
+
+.. note:: PLATFORM VENDORS BE AWARE
+
+ Linux makes a strong assumption that endpoint decoders are the only decoder
+ in the fabric that actively translates HPA to DPA. Linux assumes routing
+ decoders pass the HPA unchanged to the next decoder in the fabric.
+
+ It is therefore assumed that any given decoder in the fabric will have an
+ address range that is a subset of its upstream port decoder. Any deviation
+ from this scheme undefined per the specification. Linux prioritizes
+ spec-defined / architectural behavior.
+
+Decoders may have one or more `Downstream Targets` if configured to interleave
+memory accesses. This will be presented in sysfs via the :code:`target_list`
+parameter.
+
+Root Decoder
+~~~~~~~~~~~~
+A `Root Decoder` is logical construct of the physical address and interleave
+configurations present in the CFMWS field of the :doc:`CEDT
+<../platform/acpi/cedt>`.
+Linux presents this information as a decoder present in the `CXL Root`. We
+consider this a `Root Decoder`, though technically it exists on the boundary
+of the CXL specification and platform-specific CXL root implementations.
+
+Linux considers these logical decoders a type of `Routing Decoder`, and is the
+first decoder in the CXL fabric to receive a memory access from the platform's
+memory controllers.
+
+`Root Decoders` are created during :code:`cxl_acpi_probe`. One root decoder
+is created per CFMWS entry in the :doc:`CEDT <../platform/acpi/cedt>`.
+
+The :code:`target_list` parameter is filled by the CFMWS target fields. Targets
+of a root decoder are `Host Bridges`, which means interleave done at the root
+decoder level is an `Inter-Host-Bridge Interleave`.
+
+Only root decoders are capable of `Inter-Host-Bridge Interleave`.
+
+Such interleaves must be configured by the platform and described in the ACPI
+CEDT CFMWS, as the target CXL host bridge UIDs in the CFMWS must match the CXL
+host bridge UIDs in the CHBS field of the :doc:`CEDT
+<../platform/acpi/cedt>` and the UID field of CXL Host Bridges defined in
+the :doc:`DSDT <../platform/acpi/dsdt>`.
+
+Interleave settings in a root decoder describe how to interleave accesses among
+the *immediate downstream targets*, not the entire interleave set.
+
+The memory range described in the root decoder is used to
+
+1) Create a memory region (:code:`region0` in this example), and
+
+2) Associate the region with an IO Memory Resource (:code:`kernel/resource.c`)
+
+::
+
+ # ls /sys/bus/cxl/devices/decoder0.0/
+ cap_pmem devtype region0
+ cap_ram interleave_granularity size
+ cap_type2 interleave_ways start
+ cap_type3 locked subsystem
+ create_ram_region modalias target_list
+ delete_region qos_class uevent
+
+ # cat /sys/bus/cxl/devices/decoder0.0/region0/resource
+ 0xc050000000
+
+The IO Memory Resource is created during early boot when the CFMWS region is
+identified in the EFI Memory Map or E820 table (on x86).
+
+Root decoders are defined as a separate devtype, but are also a type
+of `Switch Decoder` due to having downstream targets. ::
+
+ # cat /sys/bus/cxl/devices/decoder0.0/devtype
+ cxl_decoder_root
+
+Switch Decoder
+~~~~~~~~~~~~~~
+Any non-root, translating decoder is considered a `Switch Decoder`, and will
+present with the type :code:`cxl_decoder_switch`. Both `Host Bridge` and `CXL
+Switch` (device) decoders are of type :code:`cxl_decoder_switch`. ::
+
+ # ls /sys/bus/cxl/devices/decoder1.0/
+ devtype locked size target_list
+ interleave_granularity modalias start target_type
+ interleave_ways region subsystem uevent
+
+ # cat /sys/bus/cxl/devices/decoder1.0/devtype
+ cxl_decoder_switch
+
+ # cat /sys/bus/cxl/devices/decoder1.0/region
+ region0
+
+A `Switch Decoder` has associations between a region defined by a root
+decoder and downstream target ports. Interleaving done within a switch decoder
+is a multi-downstream-port interleave (or `Intra-Host-Bridge Interleave` for
+host bridges).
+
+Interleave settings in a switch decoder describe how to interleave accesses
+among the *immediate downstream targets*, not the entire interleave set.
+
+Switch decoders are created during :code:`cxl_switch_port_probe` in the
+:code:`cxl_port` driver, and is created based on a PCI device's DVSEC
+registers.
+
+Switch decoder programming is validated during probe if the platform programs
+them during boot (See `Auto Decoders` below), or on commit if programmed at
+runtime (See `Runtime Programming` below).
+
+
+Endpoint Decoder
+~~~~~~~~~~~~~~~~
+Any decoder attached to a *terminal* point in the CXL fabric (`An Endpoint`) is
+considered an `Endpoint Decoder`. Endpoint decoders are of type
+:code:`cxl_decoder_endpoint`. ::
+
+ # ls /sys/bus/cxl/devices/decoder5.0
+ devtype locked start
+ dpa_resource modalias subsystem
+ dpa_size mode target_type
+ interleave_granularity region uevent
+ interleave_ways size
+
+ # cat /sys/bus/cxl/devices/decoder5.0/devtype
+ cxl_decoder_endpoint
+
+ # cat /sys/bus/cxl/devices/decoder5.0/region
+ region0
+
+An `Endpoint Decoder` has an association with a region defined by a root
+decoder and describes the device-local resource associated with this region.
+
+Unlike root and switch decoders, endpoint decoders translate `Host Physical` to
+`Device Physical` address ranges. The interleave settings on an endpoint
+therefore describe the entire *interleave set*.
+
+`Device Physical Address` regions must be committed in-order. For example, the
+DPA region starting at 0x80000000 cannot be committed before the DPA region
+starting at 0x0.
+
+As of Linux v6.15, Linux does not support *imbalanced* interleave setups, all
+endpoints in an interleave set are expected to have the same interleave
+settings (granularity and ways must be the same).
+
+Endpoint decoders are created during :code:`cxl_endpoint_port_probe` in the
+:code:`cxl_port` driver, and is created based on a PCI device's DVSEC registers.
+
+Decoder Relationships
+~~~~~~~~~~~~~~~~~~~~~
+In our example described above, there is one root decoder which routes memory
+accesses over two host bridges. Each host bridge has a decoder which routes
+access to their singular endpoint targets. Each endpoint has a decoder which
+translates HPA to DPA and services the memory request.
+
+The driver validates relationships between ports by decoder programming, so
+we can think of decoders being related in a similarly hierarchical fashion to
+ports.
+
+.. kernel-render:: DOT
+ :alt: Digraph of hierarchical relationship between root, switch, and endpoint decoders.
+ :caption: Diagraph of CXL root, switch, and endpoint decoders.
+
+ digraph foo {
+ "root0" -> "decoder0.0";
+ "decoder0.0" -> "decoder1.0";
+ "decoder0.0" -> "decoder3.0";
+ "decoder1.0" -> "decoder5.0";
+ "decoder3.0" -> "decoder6.0";
+ }
+
+Regions
+-------
+
+Memory Region
+~~~~~~~~~~~~~
+A `Memory Region` is a logical construct that connects a set of CXL ports in
+the fabric to an IO Memory Resource. It is ultimately used to expose the memory
+on these devices to the DAX subsystem via a `DAX Region`.
+
+An example RAM region: ::
+
+ # ls /sys/bus/cxl/devices/region0/
+ access0 devtype modalias subsystem uuid
+ access1 driver mode target0
+ commit interleave_granularity resource target1
+ dax_region0 interleave_ways size uevent
+
+A memory region can be constructed during endpoint probe, if decoders were
+programmed by BIOS/EFI (see `Auto Decoders`), or by creating a region manually
+via a `Root Decoder`'s :code:`create_ram_region` or :code:`create_pmem_region`
+interfaces.
+
+The interleave settings in a `Memory Region` describe the configuration of the
+`Interleave Set` - and are what can be expected to be seen in the endpoint
+interleave settings.
+
+.. kernel-render:: DOT
+ :alt: Digraph of CXL memory region relationships between root and endpoint decoders.
+ :caption: Regions are created based on root decoder configurations. Endpoint decoders
+ must be programmed with the same interleave settings as the region.
+
+ digraph foo {
+ "root0" -> "decoder0.0";
+ "decoder0.0" -> "region0";
+ "region0" -> "decoder5.0";
+ "region0" -> "decoder6.0";
+ }
+
+DAX Region
+~~~~~~~~~~
+A `DAX Region` is used to convert a CXL `Memory Region` to a DAX device. A
+DAX device may then be accessed directly via a file descriptor interface, or
+converted to System RAM via the DAX kmem driver. See the DAX driver section
+for more details. ::
+
+ # ls /sys/bus/cxl/devices/dax_region0/
+ dax0.0 devtype modalias uevent
+ dax_region driver subsystem
+
+Mailbox Interfaces
+------------------
+A mailbox command interface for each device is exposed in ::
+
+ /dev/cxl/mem0
+ /dev/cxl/mem1
+
+These mailboxes may receive any specification-defined command. Raw commands
+(custom commands) can only be sent to these interfaces if the build config
+:code:`CXL_MEM_RAW_COMMANDS` is set. This is considered a debug and/or
+development interface, not an officially supported mechanism for creation
+of vendor-specific commands (see the `fwctl` subsystem for that).
+
+Decoder Programming
+===================
+
+Runtime Programming
+-------------------
+During probe, the only decoders *required* to be programmed are `Root Decoders`.
+In reality, `Root Decoders` are a logical construct to describe the memory
+region and interleave configuration at the host bridge level - as described
+in the ACPI CEDT CFMWS.
+
+All other `Switch` and `Endpoint` decoders may be programmed by the user
+at runtime - if the platform supports such configurations.
+
+This interaction is what creates a `Software Defined Memory` environment.
+
+See the :code:`cxl-cli` documentation for more information about how to
+configure CXL decoders at runtime.
+
+Auto Decoders
+-------------
+Auto Decoders are decoders programmed by BIOS/EFI at boot time, and are
+almost always locked (cannot be changed). This is done by a platform
+which may have a static configuration - or certain quirks which may prevent
+dynamic runtime changes to the decoders (such as requiring additional
+controller programming within the CPU complex outside the scope of CXL).
+
+Auto Decoders are probed automatically as long as the devices and memory
+regions they are associated with probe without issue. When probing Auto
+Decoders, the driver's primary responsibility is to ensure the fabric is
+sane - as-if validating runtime programmed regions and decoders.
+
+If Linux cannot validate auto-decoder configuration, the memory will not
+be surfaced as a DAX device - and therefore not be exposed to the page
+allocator - effectively stranding it.
+
+Interleave
+----------
+
+The Linux CXL driver supports `Cross-Link First` interleave. This dictates
+how interleave is programmed at each decoder step, as the driver validates
+the relationships between a decoder and it's parent.
+
+For example, in a `Cross-Link First` interleave setup with 16 endpoints
+attached to 4 host bridges, linux expects the following ways/granularity
+across the root, host bridge, and endpoints respectively.
+
+.. flat-table:: 4x4 cross-link first interleave settings
+
+ * - decoder
+ - ways
+ - granularity
+
+ * - root
+ - 4
+ - 256
+
+ * - host bridge
+ - 4
+ - 1024
+
+ * - endpoint
+ - 16
+ - 256
+
+At the root, every a given access will be routed to the
+:code:`((HPA / 256) % 4)th` target host bridge. Within a host bridge, every
+:code:`((HPA / 1024) % 4)th` target endpoint. Each endpoint translates based
+on the entire 16 device interleave set.
+
+Unbalanced interleave sets are not supported - decoders at a similar point
+in the hierarchy (e.g. all host bridge decoders) must have the same ways and
+granularity configuration.
+
+At Root
+~~~~~~~
+Root decoder interleave is defined by CFMWS field of the :doc:`CEDT
+<../platform/acpi/cedt>`. The CEDT may actually define multiple CFMWS
+configurations to describe the same physical capacity, with the intent to allow
+users to decide at runtime whether to online memory as interleaved or
+non-interleaved. ::
+
+ Subtable Type : 01 [CXL Fixed Memory Window Structure]
+ Window base address : 0000000100000000
+ Window size : 0000000100000000
+ Interleave Members (2^n) : 00
+ Interleave Arithmetic : 00
+ First Target : 00000007
+
+ Subtable Type : 01 [CXL Fixed Memory Window Structure]
+ Window base address : 0000000200000000
+ Window size : 0000000100000000
+ Interleave Members (2^n) : 00
+ Interleave Arithmetic : 00
+ First Target : 00000006
+
+ Subtable Type : 01 [CXL Fixed Memory Window Structure]
+ Window base address : 0000000300000000
+ Window size : 0000000200000000
+ Interleave Members (2^n) : 01
+ Interleave Arithmetic : 00
+ First Target : 00000007
+ Next Target : 00000006
+
+In this example, the CFMWS defines two discrete non-interleaved 4GB regions
+for each host bridge, and one interleaved 8GB region that targets both. This
+would result in 3 root decoders presenting in the root. ::
+
+ # ls /sys/bus/cxl/devices/root0/decoder*
+ decoder0.0 decoder0.1 decoder0.2
+
+ # cat /sys/bus/cxl/devices/decoder0.0/target_list start size
+ 7
+ 0x100000000
+ 0x100000000
+
+ # cat /sys/bus/cxl/devices/decoder0.1/target_list start size
+ 6
+ 0x200000000
+ 0x100000000
+
+ # cat /sys/bus/cxl/devices/decoder0.2/target_list start size
+ 7,6
+ 0x300000000
+ 0x200000000
+
+These decoders are not runtime programmable. They are used to generate a
+`Memory Region` to bring this memory online with runtime programmed settings
+at the `Switch` and `Endpoint` decoders.
+
+At Host Bridge or Switch
+~~~~~~~~~~~~~~~~~~~~~~~~
+`Host Bridge` and `Switch` decoders are programmable via the following fields:
+
+- :code:`start` - the HPA region associated with the memory region
+- :code:`size` - the size of the region
+- :code:`target_list` - the list of downstream ports
+- :code:`interleave_ways` - the number downstream ports to interleave across
+- :code:`interleave_granularity` - the granularity to interleave at.
+
+Linux expects the :code:`interleave_granularity` of switch decoders to be
+derived from their upstream port connections. In `Cross-Link First` interleave
+configurations, the :code:`interleave_granularity` of a decoder is equal to
+:code:`parent_interleave_granularity * parent_interleave_ways`.
+
+At Endpoint
+~~~~~~~~~~~
+`Endpoint Decoders` are programmed similar to Host Bridge and Switch decoders,
+with the exception that the ways and granularity are defined by the interleave
+set (e.g. the interleave settings defined by the associated `Memory Region`).
+
+- :code:`start` - the HPA region associated with the memory region
+- :code:`size` - the size of the region
+- :code:`interleave_ways` - the number endpoints in the interleave set
+- :code:`interleave_granularity` - the granularity to interleave at.
+
+These settings are used by endpoint decoders to *Translate* memory requests
+from HPA to DPA. This is why they must be aware of the entire interleave set.
+
+Linux does not support unbalanced interleave configurations. As a result, all
+endpoints in an interleave set must have the same ways and granularity.
+
+Example Configurations
+======================
+.. toctree::
+ :maxdepth: 1
+
+ example-configurations/single-device.rst
+ example-configurations/hb-interleave.rst
+ example-configurations/intra-hb-interleave.rst
+ example-configurations/multi-interleave.rst
diff --git a/Documentation/driver-api/cxl/linux/dax-driver.rst b/Documentation/driver-api/cxl/linux/dax-driver.rst
new file mode 100644
index 000000000000..10d953a2167b
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/dax-driver.rst
@@ -0,0 +1,43 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+DAX Driver Operation
+====================
+The `Direct Access Device` driver was originally designed to provide a
+memory-like access mechanism to memory-like block-devices. It was
+extended to support CXL Memory Devices, which provide user-configured
+memory devices.
+
+The CXL subsystem depends on the DAX subsystem to either:
+
+- Generate a file-like interface to userland via :code:`/dev/daxN.Y`, or
+- Engage the memory-hotplug interface to add CXL memory to page allocator.
+
+The DAX subsystem exposes this ability through the `cxl_dax_region` driver.
+A `dax_region` provides the translation between a CXL `memory_region` and
+a `DAX Device`.
+
+DAX Device
+==========
+A `DAX Device` is a file-like interface exposed in :code:`/dev/daxN.Y`. A
+memory region exposed via dax device can be accessed via userland software
+via the :code:`mmap()` system-call. The result is direct mappings to the
+CXL capacity in the task's page tables.
+
+Users wishing to manually handle allocation of CXL memory should use this
+interface.
+
+kmem conversion
+===============
+The :code:`dax_kmem` driver converts a `DAX Device` into a series of `hotplug
+memory blocks` managed by :code:`kernel/memory-hotplug.c`. This capacity
+will be exposed to the kernel page allocator in the user-selected memory
+zone.
+
+The :code:`memmap_on_memory` setting (both global and DAX device local)
+dictates where the kernell will allocate the :code:`struct folio` descriptors
+for this memory will come from. If :code:`memmap_on_memory` is set, memory
+hotplug will set aside a portion of the memory block capacity to allocate
+folios. If unset, the memory is allocated via a normal :code:`GFP_KERNEL`
+allocation - and as a result will most likely land on the local NUM node of the
+CPU executing the hotplug operation.
diff --git a/Documentation/driver-api/cxl/linux/early-boot.rst b/Documentation/driver-api/cxl/linux/early-boot.rst
new file mode 100644
index 000000000000..a7fc6fc85fbe
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/early-boot.rst
@@ -0,0 +1,137 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Linux Init (Early Boot)
+=======================
+
+Linux configuration is split into two major steps: Early-Boot and everything else.
+
+During early boot, Linux sets up immutable resources (such as numa nodes), while
+later operations include things like driver probe and memory hotplug. Linux may
+read EFI and ACPI information throughout this process to configure logical
+representations of the devices.
+
+During Linux Early Boot stage (functions in the kernel that have the __init
+decorator), the system takes the resources created by EFI/BIOS
+(:doc:`ACPI tables <../platform/acpi>`) and turns them into resources that the
+kernel can consume.
+
+
+BIOS, Build and Boot Options
+============================
+
+There are 4 pre-boot options that need to be considered during kernel build
+which dictate how memory will be managed by Linux during early boot.
+
+* EFI_MEMORY_SP
+
+ * BIOS/EFI Option that dictates whether memory is SystemRAM or
+ Specific Purpose. Specific Purpose memory will be deferred to
+ drivers to manage - and not immediately exposed as system RAM.
+
+* CONFIG_EFI_SOFT_RESERVE
+
+ * Linux Build config option that dictates whether the kernel supports
+ Specific Purpose memory.
+
+* CONFIG_MHP_DEFAULT_ONLINE_TYPE
+
+ * Linux Build config that dictates whether and how Specific Purpose memory
+ converted to a dax device should be managed (left as DAX or onlined as
+ SystemRAM in ZONE_NORMAL or ZONE_MOVABLE).
+
+* nosoftreserve
+
+ * Linux kernel boot option that dictates whether Soft Reserve should be
+ supported. Similar to CONFIG_EFI_SOFT_RESERVE.
+
+Memory Map Creation
+===================
+
+While the kernel parses the EFI memory map, if :code:`Specific Purpose` memory
+is supported and detected, it will set this region aside as
+:code:`SOFT_RESERVED`.
+
+If :code:`EFI_MEMORY_SP=0`, :code:`CONFIG_EFI_SOFT_RESERVE=n`, or
+:code:`nosoftreserve=y` - Linux will default a CXL device memory region to
+SystemRAM. This will expose the memory to the kernel page allocator in
+:code:`ZONE_NORMAL`, making it available for use for most allocations (including
+:code:`struct page` and page tables).
+
+If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE_*`
+dictates whether the memory is onlined by default (:code:`_OFFLINE` or
+:code:`_ONLINE_*`), and if online which zone to online this memory to by default
+(:code:`_NORMAL` or :code:`_MOVABLE`).
+
+If placed in :code:`ZONE_MOVABLE`, the memory will not be available for most
+kernel allocations (such as :code:`struct page` or page tables). This may
+significant impact performance depending on the memory capacity of the system.
+
+
+NUMA Node Reservation
+=====================
+
+Linux refers to the proximity domains (:code:`PXM`) defined in the :doc:`SRAT
+<../platform/acpi/srat>` to create NUMA nodes in :code:`acpi_numa_init`.
+Typically, there is a 1:1 relation between :code:`PXM` and NUMA node IDs.
+
+The SRAT is the only ACPI defined way of defining Proximity Domains. Linux
+chooses to, at most, map those 1:1 with NUMA nodes.
+:doc:`CEDT <../platform/acpi/cedt>` adds a description of SPA ranges which
+Linux may map to one or more NUMA nodes.
+
+If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`PXM`
+is created (as of v6.15). In the future, Linux may reject CFMWS not described
+by SRAT due to the ambiguity of proximity domain association.
+
+It is important to note that NUMA node creation cannot be done at runtime. All
+possible NUMA nodes are identified at :code:`__init` time, more specifically
+during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`PXM`
+data for Linux to identify NUMA nodes their associated memory regions.
+
+The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`.
+
+See :doc:`Example Platform Configurations <../platform/example-configs>`
+for more info.
+
+Memory Tiers Creation
+=====================
+Memory tiers are a collection of NUMA nodes grouped by performance characteristics.
+During :code:`__init`, Linux initializes the system with a default memory tier that
+contains all nodes marked :code:`N_MEMORY`.
+
+:code:`memory_tier_init` is called at boot for all nodes with memory online by
+default. :code:`memory_tier_late_init` is called during late-init for nodes setup
+during driver configuration.
+
+Nodes are only marked :code:`N_MEMORY` if they have *online* memory.
+
+Tier membership can be inspected in ::
+
+ /sys/devices/virtual/memory_tiering/memory_tierN/nodelist
+ 0-1
+
+If nodes are grouped which have clear difference in performance, check the
+:doc:`HMAT <../platform/acpi/hmat>` and CDAT information for the CXL nodes. All
+nodes default to the DRAM tier, unless HMAT/CDAT information is reported to the
+memory_tier component via `access_coordinates`.
+
+For more, see :doc:`CXL access coordinates documentation
+<../linux/access-coordinates>`.
+
+Contiguous Memory Allocation
+============================
+The contiguous memory allocator (CMA) enables reservation of contiguous memory
+regions on NUMA nodes during early boot. However, CMA cannot reserve memory
+on NUMA nodes that are not online during early boot. ::
+
+ void __init hugetlb_cma_reserve(int order) {
+ if (!node_online(nid))
+ /* do not allow reservations */
+ }
+
+This means if users intend to defer management of CXL memory to the driver, CMA
+cannot be used to guarantee huge page allocations. If enabling CXL memory as
+SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node can be
+made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line
+parameters.
diff --git a/Documentation/driver-api/cxl/linux/example-configurations/hb-interleave.rst b/Documentation/driver-api/cxl/linux/example-configurations/hb-interleave.rst
new file mode 100644
index 000000000000..f071490763a2
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/example-configurations/hb-interleave.rst
@@ -0,0 +1,314 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Inter-Host-Bridge Interleave
+============================
+This cxl-cli configuration dump shows the following host configuration:
+
+* A single socket system with one CXL root
+* CXL Root has Four (4) CXL Host Bridges
+* Two CXL Host Bridges have a single CXL Memory Expander Attached
+* The CXL root is configured to interleave across the two host bridges.
+
+This output is generated by :code:`cxl list -v` and describes the relationships
+between objects exposed in :code:`/sys/bus/cxl/devices/`.
+
+::
+
+ [
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "nr_dports":4,
+ "dports":[
+ {
+ "dport":"pci0000:00",
+ "alias":"ACPI0016:01",
+ "id":0
+ },
+ {
+ "dport":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "id":4
+ },
+ {
+ "dport":"pci0000:2a",
+ "alias":"ACPI0016:03",
+ "id":1
+ },
+ {
+ "dport":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "id":5
+ }
+ ],
+
+This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL
+Host Bridges. The `Root` can be considered the singular upstream port attached
+to the platform's memory controller - which routes memory requests to it.
+
+The `ports:root0` section lays out how each of these downstream ports are
+configured. If a port is not configured (id's 0 and 1), they are omitted.
+
+::
+
+ "ports:root0":[
+ {
+ "port":"port1",
+ "host":"pci0000:d2",
+ "depth":1,
+ "nr_dports":3,
+ "dports":[
+ {
+ "dport":"0000:d2:01.1",
+ "alias":"device:02",
+ "id":0
+ },
+ {
+ "dport":"0000:d2:01.3",
+ "alias":"device:05",
+ "id":2
+ },
+ {
+ "dport":"0000:d2:07.1",
+ "alias":"device:0d",
+ "id":113
+ }
+ ],
+
+This chunk shows the available downstream ports associated with the CXL Host
+Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream
+ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`..
+
+::
+
+ "endpoints:port1":[
+ {
+ "endpoint":"endpoint5",
+ "host":"mem0",
+ "parent_dport":"0000:d2:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem0",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:d3:00.0"
+ },
+ "decoders:endpoint5":[
+ {
+ "decoder":"decoder5.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+
+This chunk shows the endpoints attached to the host bridge :code:`port1`.
+
+:code:`endpoint5` contains a single configured decoder :code:`decoder5.0`
+which has the same interleave configuration as :code:`region0` (shown later).
+
+Next we have the decodesr belonging to the host bridge:
+
+::
+
+ "decoders:port1":[
+ {
+ "decoder":"decoder1.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":1,
+ "region":"region0",
+ "nr_targets":1,
+ "targets":[
+ {
+ "target":"0000:d2:01.1",
+ "alias":"device:02",
+ "position":0,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose only
+target is :code:`dport1` - which is attached to :code:`endpoint5`.
+
+The following chunk shows a similar configuration for Host Bridge :code:`port3`,
+the second host bridge with a memory device attached.
+
+::
+
+ {
+ "port":"port3",
+ "host":"pci0000:a8",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:a8:01.1",
+ "alias":"device:c3",
+ "id":0
+ }
+ ],
+ "endpoints:port3":[
+ {
+ "endpoint":"endpoint6",
+ "host":"mem1",
+ "parent_dport":"0000:a8:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem1",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:a9:00.0"
+ },
+ "decoders:endpoint6":[
+ {
+ "decoder":"decoder6.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+ "decoders:port3":[
+ {
+ "decoder":"decoder3.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":1,
+ "region":"region0",
+ "nr_targets":1,
+ "targets":[
+ {
+ "target":"0000:a8:01.1",
+ "alias":"device:c3",
+ "position":0,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+
+The next chunk shows the two CXL host bridges without attached endpoints.
+
+::
+
+ {
+ "port":"port2",
+ "host":"pci0000:00",
+ "depth":1,
+ "nr_dports":2,
+ "dports":[
+ {
+ "dport":"0000:00:01.3",
+ "alias":"device:55",
+ "id":2
+ },
+ {
+ "dport":"0000:00:07.1",
+ "alias":"device:5d",
+ "id":113
+ }
+ ]
+ },
+ {
+ "port":"port4",
+ "host":"pci0000:2a",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:2a:01.1",
+ "alias":"device:d0",
+ "id":0
+ }
+ ]
+ }
+ ],
+
+Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder
+applies the interleave across the downstream ports :code:`port1` and
+:code:`port3` - with a granularity of 256 bytes.
+
+This information is generated by the CXL driver reading the ACPI CEDT CMFWS.
+
+::
+
+ "decoders:root0":[
+ {
+ "decoder":"decoder0.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "max_available_extent":0,
+ "volatile_capable":true,
+ "nr_targets":2,
+ "targets":[
+ {
+ "target":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "position":1,
+ "id":4
+ },
+ {
+ "target":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "position":0,
+ "id":5
+ }
+ ],
+
+Finally we have the `Memory Region` associated with the `Root Decoder`
+:code:`decoder0.0`. This region describes the overall interleave configuration
+of the interleave set.
+
+::
+
+ "regions:decoder0.0":[
+ {
+ "region":"region0",
+ "resource":825975898112,
+ "size":274877906944,
+ "type":"ram",
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "decode_state":"commit",
+ "mappings":[
+ {
+ "position":1,
+ "memdev":"mem1",
+ "decoder":"decoder6.0"
+ },
+ {
+ "position":0,
+ "memdev":"mem0",
+ "decoder":"decoder5.0"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
diff --git a/Documentation/driver-api/cxl/linux/example-configurations/intra-hb-interleave.rst b/Documentation/driver-api/cxl/linux/example-configurations/intra-hb-interleave.rst
new file mode 100644
index 000000000000..077dfaf8458d
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/example-configurations/intra-hb-interleave.rst
@@ -0,0 +1,291 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================
+Intra-Host-Bridge Interleave
+============================
+This cxl-cli configuration dump shows the following host configuration:
+
+* A single socket system with one CXL root
+* CXL Root has Four (4) CXL Host Bridges
+* One (1) CXL Host Bridges has two CXL Memory Expanders Attached
+* The Host bridge decoder is programmed to interleave across the expanders.
+
+This output is generated by :code:`cxl list -v` and describes the relationships
+between objects exposed in :code:`/sys/bus/cxl/devices/`.
+
+::
+
+ [
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "nr_dports":4,
+ "dports":[
+ {
+ "dport":"pci0000:00",
+ "alias":"ACPI0016:01",
+ "id":0
+ },
+ {
+ "dport":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "id":4
+ },
+ {
+ "dport":"pci0000:2a",
+ "alias":"ACPI0016:03",
+ "id":1
+ },
+ {
+ "dport":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "id":5
+ }
+ ],
+
+This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL
+Host Bridges. The `Root` can be considered the singular upstream port attached
+to the platform's memory controller - which routes memory requests to it.
+
+The `ports:root0` section lays out how each of these downstream ports are
+configured. If a port is not configured (id's 0 and 1), they are omitted.
+
+::
+
+ "ports:root0":[
+ {
+ "port":"port1",
+ "host":"pci0000:d2",
+ "depth":1,
+ "nr_dports":3,
+ "dports":[
+ {
+ "dport":"0000:d2:01.1",
+ "alias":"device:02",
+ "id":0
+ },
+ {
+ "dport":"0000:d2:01.3",
+ "alias":"device:05",
+ "id":2
+ },
+ {
+ "dport":"0000:d2:07.1",
+ "alias":"device:0d",
+ "id":113
+ }
+ ],
+
+This chunk shows the available downstream ports associated with the CXL Host
+Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream
+ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`..
+
+::
+
+ "endpoints:port1":[
+ {
+ "endpoint":"endpoint5",
+ "host":"mem0",
+ "parent_dport":"0000:d2:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem0",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:d3:00.0"
+ },
+ "decoders:endpoint5":[
+ {
+ "decoder":"decoder5.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ },
+ {
+ "endpoint":"endpoint6",
+ "host":"mem1",
+ "parent_dport":"0000:d2:01.3,
+ "depth":2,
+ "memdev":{
+ "memdev":"mem1",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:a9:00.0"
+ },
+ "decoders:endpoint6":[
+ {
+ "decoder":"decoder6.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+
+This chunk shows the endpoints attached to the host bridge :code:`port1`.
+
+:code:`endpoint5` contains a single configured decoder :code:`decoder5.0`
+which has the same interleave configuration memory region they belong to
+(show later).
+
+Next we have the decoders belonging to the host bridge:
+
+::
+
+ "decoders:port1":[
+ {
+ "decoder":"decoder1.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "region":"region0",
+ "nr_targets":2,
+ "targets":[
+ {
+ "target":"0000:d2:01.1",
+ "alias":"device:02",
+ "position":0,
+ "id":0
+ },
+ {
+ "target":"0000:d2:01.3",
+ "alias":"device:05",
+ "position":1,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`) with two
+targets: :code:`dport1` and :code:`dport3` - which are attached to
+:code:`endpoint5` and :code:`endpoint6` respectively.
+
+The host bridge decoder interleaves these devices at a 256 byte granularity.
+
+The next chunk shows the three CXL host bridges without attached endpoints.
+
+::
+
+ {
+ "port":"port2",
+ "host":"pci0000:00",
+ "depth":1,
+ "nr_dports":2,
+ "dports":[
+ {
+ "dport":"0000:00:01.3",
+ "alias":"device:55",
+ "id":2
+ },
+ {
+ "dport":"0000:00:07.1",
+ "alias":"device:5d",
+ "id":113
+ }
+ ]
+ },
+ {
+ "port":"port3",
+ "host":"pci0000:a8",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:a8:01.1",
+ "alias":"device:c3",
+ "id":0
+ }
+ ],
+ },
+ {
+ "port":"port4",
+ "host":"pci0000:2a",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:2a:01.1",
+ "alias":"device:d0",
+ "id":0
+ }
+ ]
+ }
+ ],
+
+Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder
+applies the interleave across the downstream ports :code:`port1` and
+:code:`port3` - with a granularity of 256 bytes.
+
+This information is generated by the CXL driver reading the ACPI CEDT CMFWS.
+
+::
+
+ "decoders:root0":[
+ {
+ "decoder":"decoder0.0",
+ "resource":825975898112,
+ "size":274877906944,
+ "interleave_ways":1,
+ "max_available_extent":0,
+ "volatile_capable":true,
+ "nr_targets":2,
+ "targets":[
+ {
+ "target":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "position":1,
+ "id":4
+ },
+ ],
+
+Finally we have the `Memory Region` associated with the `Root Decoder`
+:code:`decoder0.0`. This region describes the overall interleave configuration
+of the interleave set.
+
+::
+
+ "regions:decoder0.0":[
+ {
+ "region":"region0",
+ "resource":825975898112,
+ "size":274877906944,
+ "type":"ram",
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "decode_state":"commit",
+ "mappings":[
+ {
+ "position":1,
+ "memdev":"mem1",
+ "decoder":"decoder6.0"
+ },
+ {
+ "position":0,
+ "memdev":"mem0",
+ "decoder":"decoder5.0"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
diff --git a/Documentation/driver-api/cxl/linux/example-configurations/multi-interleave.rst b/Documentation/driver-api/cxl/linux/example-configurations/multi-interleave.rst
new file mode 100644
index 000000000000..008f9053c630
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/example-configurations/multi-interleave.rst
@@ -0,0 +1,401 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Multi-Level Interleave
+======================
+This cxl-cli configuration dump shows the following host configuration:
+
+* A single socket system with one CXL root
+* CXL Root has Four (4) CXL Host Bridges
+* Two CXL Host Bridges have a two CXL Memory Expanders Attached each.
+* The CXL root is configured to interleave across the two host bridges.
+* Each host bridge with expanders interleaves across two endpoints.
+
+This output is generated by :code:`cxl list -v` and describes the relationships
+between objects exposed in :code:`/sys/bus/cxl/devices/`.
+
+::
+
+ [
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "nr_dports":4,
+ "dports":[
+ {
+ "dport":"pci0000:00",
+ "alias":"ACPI0016:01",
+ "id":0
+ },
+ {
+ "dport":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "id":4
+ },
+ {
+ "dport":"pci0000:2a",
+ "alias":"ACPI0016:03",
+ "id":1
+ },
+ {
+ "dport":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "id":5
+ }
+ ],
+
+This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL
+Host Bridges. The `Root` can be considered the singular upstream port attached
+to the platform's memory controller - which routes memory requests to it.
+
+The `ports:root0` section lays out how each of these downstream ports are
+configured. If a port is not configured (id's 0 and 1), they are omitted.
+
+::
+
+ "ports:root0":[
+ {
+ "port":"port1",
+ "host":"pci0000:d2",
+ "depth":1,
+ "nr_dports":3,
+ "dports":[
+ {
+ "dport":"0000:d2:01.1",
+ "alias":"device:02",
+ "id":0
+ },
+ {
+ "dport":"0000:d2:01.3",
+ "alias":"device:05",
+ "id":2
+ },
+ {
+ "dport":"0000:d2:07.1",
+ "alias":"device:0d",
+ "id":113
+ }
+ ],
+
+This chunk shows the available downstream ports associated with the CXL Host
+Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream
+ports: :code:`dport0`, :code:`dport2`, and :code:`dport113`.
+
+::
+
+ "endpoints:port1":[
+ {
+ "endpoint":"endpoint5",
+ "host":"mem0",
+ "parent_dport":"0000:d2:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem0",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:d3:00.0"
+ },
+ "decoders:endpoint5":[
+ {
+ "decoder":"decoder5.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":4,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ },
+ {
+ "endpoint":"endpoint6",
+ "host":"mem1",
+ "parent_dport":"0000:d2:01.3",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem1",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:d3:00.0"
+ },
+ "decoders:endpoint6":[
+ {
+ "decoder":"decoder6.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":4,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+
+This chunk shows the endpoints attached to the host bridge :code:`port1`.
+
+:code:`endpoint5` contains a single configured decoder :code:`decoder5.0`
+which has the same interleave configuration as :code:`region0` (shown later).
+
+:code:`endpoint6` contains a single configured decoder :code:`decoder5.0`
+which has the same interleave configuration as :code:`region0` (shown later).
+
+Next we have the decoders belonging to the host bridge:
+
+::
+
+ "decoders:port1":[
+ {
+ "decoder":"decoder1.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":2,
+ "interleave_granularity":512,
+ "region":"region0",
+ "nr_targets":2,
+ "targets":[
+ {
+ "target":"0000:d2:01.1",
+ "alias":"device:02",
+ "position":0,
+ "id":0
+ },
+ {
+ "target":"0000:d2:01.3",
+ "alias":"device:05",
+ "position":2,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose
+targets are :code:`dport0` and :code:`dport2` - which are attached to
+:code:`endpoint5` and :code:`endpoint6` respectively.
+
+The following chunk shows a similar configuration for Host Bridge :code:`port3`,
+the second host bridge with a memory device attached.
+
+::
+
+ {
+ "port":"port3",
+ "host":"pci0000:a8",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:a8:01.1",
+ "alias":"device:c3",
+ "id":0
+ },
+ {
+ "dport":"0000:a8:01.3",
+ "alias":"device:c5",
+ "id":0
+ }
+ ],
+ "endpoints:port3":[
+ {
+ "endpoint":"endpoint7",
+ "host":"mem2",
+ "parent_dport":"0000:a8:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem2",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:a9:00.0"
+ },
+ "decoders:endpoint7":[
+ {
+ "decoder":"decoder7.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":4,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ },
+ {
+ "endpoint":"endpoint8",
+ "host":"mem3",
+ "parent_dport":"0000:a8:01.3",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem3",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:a9:00.0"
+ },
+ "decoders:endpoint8":[
+ {
+ "decoder":"decoder8.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":4,
+ "interleave_granularity":256,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+ "decoders:port3":[
+ {
+ "decoder":"decoder3.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":2,
+ "interleave_granularity":512,
+ "region":"region0",
+ "nr_targets":1,
+ "targets":[
+ {
+ "target":"0000:a8:01.1",
+ "alias":"device:c3",
+ "position":1,
+ "id":0
+ },
+ {
+ "target":"0000:a8:01.3",
+ "alias":"device:c5",
+ "position":3,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+
+The next chunk shows the two CXL host bridges without attached endpoints.
+
+::
+
+ {
+ "port":"port2",
+ "host":"pci0000:00",
+ "depth":1,
+ "nr_dports":2,
+ "dports":[
+ {
+ "dport":"0000:00:01.3",
+ "alias":"device:55",
+ "id":2
+ },
+ {
+ "dport":"0000:00:07.1",
+ "alias":"device:5d",
+ "id":113
+ }
+ ]
+ },
+ {
+ "port":"port4",
+ "host":"pci0000:2a",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:2a:01.1",
+ "alias":"device:d0",
+ "id":0
+ }
+ ]
+ }
+ ],
+
+Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder
+applies the interleave across the downstream ports :code:`port1` and
+:code:`port3` - with a granularity of 256 bytes.
+
+This information is generated by the CXL driver reading the ACPI CEDT CMFWS.
+
+::
+
+ "decoders:root0":[
+ {
+ "decoder":"decoder0.0",
+ "resource":825975898112,
+ "size":549755813888,
+ "interleave_ways":2,
+ "interleave_granularity":256,
+ "max_available_extent":0,
+ "volatile_capable":true,
+ "nr_targets":2,
+ "targets":[
+ {
+ "target":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "position":1,
+ "id":4
+ },
+ {
+ "target":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "position":0,
+ "id":5
+ }
+ ],
+
+Finally we have the `Memory Region` associated with the `Root Decoder`
+:code:`decoder0.0`. This region describes the overall interleave configuration
+of the interleave set. So we see there are a total of :code:`4` interleave
+targets across 4 endpoint decoders.
+
+::
+
+ "regions:decoder0.0":[
+ {
+ "region":"region0",
+ "resource":825975898112,
+ "size":549755813888,
+ "type":"ram",
+ "interleave_ways":4,
+ "interleave_granularity":256,
+ "decode_state":"commit",
+ "mappings":[
+ {
+ "position":3,
+ "memdev":"mem3",
+ "decoder":"decoder8.0"
+ },
+ {
+ "position":2,
+ "memdev":"mem1",
+ "decoder":"decoder6.0"
+ }
+ {
+ "position":1,
+ "memdev":"mem2",
+ "decoder":"decoder7.0"
+ },
+ {
+ "position":0,
+ "memdev":"mem0",
+ "decoder":"decoder5.0"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
diff --git a/Documentation/driver-api/cxl/linux/example-configurations/single-device.rst b/Documentation/driver-api/cxl/linux/example-configurations/single-device.rst
new file mode 100644
index 000000000000..5fd38eb0aaf4
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/example-configurations/single-device.rst
@@ -0,0 +1,246 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Single Device
+=============
+This cxl-cli configuration dump shows the following host configuration:
+
+* A single socket system with one CXL root
+* CXL Root has Four (4) CXL Host Bridges
+* One CXL Host Bridges has a single CXL Memory Expander Attached
+* No interleave is present.
+
+This output is generated by :code:`cxl list -v` and describes the relationships
+between objects exposed in :code:`/sys/bus/cxl/devices/`.
+
+::
+
+ [
+ {
+ "bus":"root0",
+ "provider":"ACPI.CXL",
+ "nr_dports":4,
+ "dports":[
+ {
+ "dport":"pci0000:00",
+ "alias":"ACPI0016:01",
+ "id":0
+ },
+ {
+ "dport":"pci0000:a8",
+ "alias":"ACPI0016:02",
+ "id":4
+ },
+ {
+ "dport":"pci0000:2a",
+ "alias":"ACPI0016:03",
+ "id":1
+ },
+ {
+ "dport":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "id":5
+ }
+ ],
+
+This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL
+Host Bridges. The `Root` can be considered the singular upstream port attached
+to the platform's memory controller - which routes memory requests to it.
+
+The `ports:root0` section lays out how each of these downstream ports are
+configured. If a port is not configured (id's 0, 1, and 4), they are omitted.
+
+::
+
+ "ports:root0":[
+ {
+ "port":"port1",
+ "host":"pci0000:d2",
+ "depth":1,
+ "nr_dports":3,
+ "dports":[
+ {
+ "dport":"0000:d2:01.1",
+ "alias":"device:02",
+ "id":0
+ },
+ {
+ "dport":"0000:d2:01.3",
+ "alias":"device:05",
+ "id":2
+ },
+ {
+ "dport":"0000:d2:07.1",
+ "alias":"device:0d",
+ "id":113
+ }
+ ],
+
+This chunk shows the available downstream ports associated with the CXL Host
+Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream
+ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`..
+
+::
+
+ "endpoints:port1":[
+ {
+ "endpoint":"endpoint5",
+ "host":"mem0",
+ "parent_dport":"0000:d2:01.1",
+ "depth":2,
+ "memdev":{
+ "memdev":"mem0",
+ "ram_size":137438953472,
+ "serial":0,
+ "numa_node":0,
+ "host":"0000:d3:00.0"
+ },
+ "decoders:endpoint5":[
+ {
+ "decoder":"decoder5.0",
+ "resource":825975898112,
+ "size":137438953472,
+ "interleave_ways":1,
+ "region":"region0",
+ "dpa_resource":0,
+ "dpa_size":137438953472,
+ "mode":"ram"
+ }
+ ]
+ }
+ ],
+
+This chunk shows the endpoints attached to the host bridge :code:`port1`.
+
+:code:`endpoint5` contains a single configured decoder :code:`decoder5.0`
+which has the same interleave configuration as :code:`region0` (shown later).
+
+Next we have the decoders belonging to the host bridge:
+
+::
+
+ "decoders:port1":[
+ {
+ "decoder":"decoder1.0",
+ "resource":825975898112,
+ "size":137438953472,
+ "interleave_ways":1,
+ "region":"region0",
+ "nr_targets":1,
+ "targets":[
+ {
+ "target":"0000:d2:01.1",
+ "alias":"device:02",
+ "position":0,
+ "id":0
+ }
+ ]
+ }
+ ]
+ },
+
+Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose only
+target is :code:`dport1` - which is attached to :code:`endpoint5`.
+
+The next chunk shows the three CXL host bridges without attached endpoints.
+
+::
+
+ {
+ "port":"port2",
+ "host":"pci0000:00",
+ "depth":1,
+ "nr_dports":2,
+ "dports":[
+ {
+ "dport":"0000:00:01.3",
+ "alias":"device:55",
+ "id":2
+ },
+ {
+ "dport":"0000:00:07.1",
+ "alias":"device:5d",
+ "id":113
+ }
+ ]
+ },
+ {
+ "port":"port3",
+ "host":"pci0000:a8",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:a8:01.1",
+ "alias":"device:c3",
+ "id":0
+ }
+ ]
+ },
+ {
+ "port":"port4",
+ "host":"pci0000:2a",
+ "depth":1,
+ "nr_dports":1,
+ "dports":[
+ {
+ "dport":"0000:2a:01.1",
+ "alias":"device:d0",
+ "id":0
+ }
+ ]
+ }
+ ],
+
+Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder
+is a pass-through decoder because :code:`interleave_ways` is set to :code:`1`.
+
+This information is generated by the CXL driver reading the ACPI CEDT CMFWS.
+
+::
+
+ "decoders:root0":[
+ {
+ "decoder":"decoder0.0",
+ "resource":825975898112,
+ "size":137438953472,
+ "interleave_ways":1,
+ "max_available_extent":0,
+ "volatile_capable":true,
+ "nr_targets":1,
+ "targets":[
+ {
+ "target":"pci0000:d2",
+ "alias":"ACPI0016:00",
+ "position":0,
+ "id":5
+ }
+ ],
+
+Finally we have the `Memory Region` associated with the `Root Decoder`
+:code:`decoder0.0`. This region describes the discrete region associated
+with the lone device.
+
+::
+
+ "regions:decoder0.0":[
+ {
+ "region":"region0",
+ "resource":825975898112,
+ "size":137438953472,
+ "type":"ram",
+ "interleave_ways":1,
+ "decode_state":"commit",
+ "mappings":[
+ {
+ "position":0,
+ "memdev":"mem0",
+ "decoder":"decoder5.0"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+ ]
diff --git a/Documentation/driver-api/cxl/linux/memory-hotplug.rst b/Documentation/driver-api/cxl/linux/memory-hotplug.rst
new file mode 100644
index 000000000000..af368c2bc9cf
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/memory-hotplug.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============
+Memory Hotplug
+==============
+The final phase of surfacing CXL memory to the kernel page allocator is for
+the `DAX` driver to surface a `Driver Managed` memory region via the
+memory-hotplug component.
+
+There are four major configurations to consider:
+
+1) Default Online Behavior (on/off and zone)
+2) Hotplug Memory Block size
+3) Memory Map Resource location
+4) Driver-Managed Memory Designation
+
+Default Online Behavior
+=======================
+The default-online behavior of hotplug memory is dictated by the following,
+in order of precedence:
+
+- :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE` Build Configuration
+- :code:`memhp_default_state` Boot parameter
+- :code:`/sys/devices/system/memory/auto_online_blocks` value
+
+These dictate whether hotplugged memory blocks arrive in one of three states:
+
+1) Offline
+2) Online in :code:`ZONE_NORMAL`
+3) Online in :code:`ZONE_MOVABLE`
+
+:code:`ZONE_NORMAL` implies this capacity may be used for almost any allocation,
+while :code:`ZONE_MOVABLE` implies this capacity should only be used for
+migratable allocations.
+
+:code:`ZONE_MOVABLE` attempts to retain the hotplug-ability of a memory block
+so that it the entire region may be hot-unplugged at a later time. Any capacity
+onlined into :code:`ZONE_NORMAL` should be considered permanently attached to
+the page allocator.
+
+Hotplug Memory Block Size
+=========================
+By default, on most architectures, the Hotplug Memory Block Size is either
+128MB or 256MB. On x86, the block size increases up to 2GB as total memory
+capacity exceeds 64GB. As of v6.15, Linux does not take into account the
+size and alignment of the ACPI CEDT CFMWS regions (see Early Boot docs) when
+deciding the Hotplug Memory Block Size.
+
+Memory Map
+==========
+The location of :code:`struct folio` allocations to represent the hotplugged
+memory capacity are dictated by the following system settings:
+
+- :code:`/sys_module/memory_hotplug/parameters/memmap_on_memory`
+- :code:`/sys/bus/dax/devices/daxN.Y/memmap_on_memory`
+
+If both of these parameters are set to true, :code:`struct folio` for this
+capacity will be carved out of the memory block being onlined. This has
+performance implications if the memory is particularly high-latency and
+its :code:`struct folio` becomes hotly contended.
+
+If either parameter is set to false, :code:`struct folio` for this capacity
+will be allocated from the local node of the processor running the hotplug
+procedure. This capacity will be allocated from :code:`ZONE_NORMAL` on
+that node, as it is a :code:`GFP_KERNEL` allocation.
+
+Systems with extremely large amounts of :code:`ZONE_MOVABLE` memory (e.g.
+CXL memory pools) must ensure that there is sufficient local
+:code:`ZONE_NORMAL` capacity to host the memory map for the hotplugged capacity.
+
+Driver Managed Memory
+=====================
+The DAX driver surfaces this memory to memory-hotplug as "Driver Managed". This
+is not a configurable setting, but it's important to note that driver managed
+memory is explicitly excluded from use during kexec. This is required to ensure
+any reset or out-of-band operations that the CXL device may be subject to during
+a functional system-reboot (such as a reset-on-probe) will not cause portions of
+the kexec kernel to be overwritten.
diff --git a/Documentation/driver-api/cxl/linux/overview.rst b/Documentation/driver-api/cxl/linux/overview.rst
new file mode 100644
index 000000000000..648beb2c8c83
--- /dev/null
+++ b/Documentation/driver-api/cxl/linux/overview.rst
@@ -0,0 +1,103 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========
+Overview
+========
+
+This section presents the configuration process of a CXL Type-3 memory device,
+and how it is ultimately exposed to users as either a :code:`DAX` device or
+normal memory pages via the kernel's page allocator.
+
+Portions marked with a bullet are points at which certain kernel objects
+are generated.
+
+1) Early Boot
+
+ a) BIOS, Build, and Boot Parameters
+
+ i) EFI_MEMORY_SP
+ ii) CONFIG_EFI_SOFT_RESERVE
+ iii) CONFIG_MHP_DEFAULT_ONLINE_TYPE
+ iv) nosoftreserve
+
+ b) Memory Map Creation
+
+ i) EFI Memory Map / E820 Consulted for Soft-Reserved
+
+ * CXL Memory is set aside to be handled by the CXL driver
+
+ * Soft-Reserved IO Resource created for CFMWS entry
+
+ c) NUMA Node Creation
+
+ * Nodes created from ACPI CEDT CFMWS and SRAT Proximity domains (PXM)
+
+ d) Memory Tier Creation
+
+ * A default memory_tier is created with all nodes.
+
+ e) Contiguous Memory Allocation
+
+ * Any requested CMA is allocated from Online nodes
+
+ f) Init Finishes, Drivers start probing
+
+2) ACPI and PCI Drivers
+
+ a) Detects PCI device is CXL, marking it for probe by CXL driver
+
+3) CXL Driver Operation
+
+ a) Base device creation
+
+ * root, port, and memdev devices created
+ * CEDT CFMWS IO Resource creation
+
+ b) Decoder creation
+
+ * root, switch, and endpoint decoders created
+
+ c) Logical device creation
+
+ * memory_region and endpoint devices created
+
+ d) Devices are associated with each other
+
+ * If auto-decoder (BIOS-programmed decoders), driver validates
+ configurations, builds associations, and locks configs at probe time.
+
+ * If user-configured, validation and associations are built at
+ decoder-commit time.
+
+ e) Regions surfaced as DAX region
+
+ * dax_region created
+
+ * DAX device created via DAX driver
+
+4) DAX Driver Operation
+
+ a) DAX driver surfaces DAX region as one of two dax device modes
+
+ * kmem - dax device is converted to hotplug memory blocks
+
+ * DAX kmem IO Resource creation
+
+ * hmem - dax device is left as daxdev to be accessed as a file.
+
+ * If hmem, journey ends here.
+
+ b) DAX kmem surfaces memory region to Memory Hotplug to add to page
+ allocator as "driver managed memory"
+
+5) Memory Hotplug
+
+ a) mhp component surfaces a dax device memory region as multiple memory
+ blocks to the page allocator
+
+ * blocks appear in :code:`/sys/bus/memory/devices` and linked to a NUMA node
+
+ b) blocks are onlined into the requested zone (NORMAL or MOVABLE)
+
+ * Memory is marked "Driver Managed" to avoid kexec from using it as region
+ for kernel updates