summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/LSM/ipe.rst69
-rw-r--r--Documentation/admin-guide/blockdev/index.rst1
-rw-r--r--Documentation/admin-guide/blockdev/zoned_loop.rst169
-rw-r--r--Documentation/admin-guide/bug-hunting.rst2
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst81
-rw-r--r--Documentation/admin-guide/gpio/gpio-aggregator.rst107
-rw-r--r--Documentation/admin-guide/hw-vuln/index.rst2
-rw-r--r--Documentation/admin-guide/hw-vuln/indirect-target-selection.rst168
-rw-r--r--Documentation/admin-guide/hw-vuln/old_microcode.rst21
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt57
-rw-r--r--Documentation/admin-guide/laptops/alienware-wmi.rst127
-rw-r--r--Documentation/admin-guide/laptops/index.rst1
-rw-r--r--Documentation/admin-guide/media/c3-isp.dot26
-rw-r--r--Documentation/admin-guide/media/c3-isp.rst101
-rw-r--r--Documentation/admin-guide/media/mgb4.rst9
-rw-r--r--Documentation/admin-guide/media/pci-cardlist.rst1
-rw-r--r--Documentation/admin-guide/media/v4l-drivers.rst1
-rw-r--r--Documentation/admin-guide/namespaces/resource-control.rst24
-rw-r--r--Documentation/admin-guide/pm/cpufreq.rst8
-rw-r--r--Documentation/admin-guide/pm/intel_idle.rst21
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst104
-rw-r--r--Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst10
-rw-r--r--Documentation/admin-guide/quickly-build-trimmed-linux.rst4
-rw-r--r--Documentation/admin-guide/reporting-issues.rst6
-rw-r--r--Documentation/admin-guide/sysctl/vm.rst32
-rw-r--r--Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst4
-rw-r--r--Documentation/admin-guide/xfs.rst11
27 files changed, 1086 insertions, 81 deletions
diff --git a/Documentation/admin-guide/LSM/ipe.rst b/Documentation/admin-guide/LSM/ipe.rst
index f93a467db628..dc7088451f9d 100644
--- a/Documentation/admin-guide/LSM/ipe.rst
+++ b/Documentation/admin-guide/LSM/ipe.rst
@@ -423,7 +423,7 @@ Field descriptions:
Event Example::
- type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1
+ type=1422 audit(1653425529.927:53): policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F26765076DD8EED7B8F4DB auid=4294967295 ses=4294967295 lsm=ipe res=1 errno=0
type=1300 audit(1653425529.927:53): arch=c000003e syscall=1 success=yes exit=2567 a0=3 a1=5596fcae1fb0 a2=a07 a3=2 items=0 ppid=184 pid=229 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="python3" exe="/usr/bin/python3.10" key=(null)
type=1327 audit(1653425529.927:53): PROCTITLE proctitle=707974686F6E3300746573742F6D61696E2E7079002D66002E2E
@@ -433,24 +433,55 @@ This record will always be emitted in conjunction with a ``AUDITSYSCALL`` record
Field descriptions:
-+----------------+------------+-----------+---------------------------------------------------+
-| Field | Value Type | Optional? | Description of Value |
-+================+============+===========+===================================================+
-| policy_name | string | No | The policy_name |
-+----------------+------------+-----------+---------------------------------------------------+
-| policy_version | string | No | The policy_version |
-+----------------+------------+-----------+---------------------------------------------------+
-| policy_digest | string | No | The policy hash |
-+----------------+------------+-----------+---------------------------------------------------+
-| auid | integer | No | The login user ID |
-+----------------+------------+-----------+---------------------------------------------------+
-| ses | integer | No | The login session ID |
-+----------------+------------+-----------+---------------------------------------------------+
-| lsm | string | No | The lsm name associated with the event |
-+----------------+------------+-----------+---------------------------------------------------+
-| res | integer | No | The result of the audited operation(success/fail) |
-+----------------+------------+-----------+---------------------------------------------------+
-
++----------------+------------+-----------+-------------------------------------------------------------+
+| Field | Value Type | Optional? | Description of Value |
++================+============+===========+=============================================================+
+| policy_name | string | Yes | The policy_name |
++----------------+------------+-----------+-------------------------------------------------------------+
+| policy_version | string | Yes | The policy_version |
++----------------+------------+-----------+-------------------------------------------------------------+
+| policy_digest | string | Yes | The policy hash |
++----------------+------------+-----------+-------------------------------------------------------------+
+| auid | integer | No | The login user ID |
++----------------+------------+-----------+-------------------------------------------------------------+
+| ses | integer | No | The login session ID |
++----------------+------------+-----------+-------------------------------------------------------------+
+| lsm | string | No | The lsm name associated with the event |
++----------------+------------+-----------+-------------------------------------------------------------+
+| res | integer | No | The result of the audited operation(success/fail) |
++----------------+------------+-----------+-------------------------------------------------------------+
+| errno | integer | No | Error code from policy loading operations (see table below) |
++----------------+------------+-----------+-------------------------------------------------------------+
+
+Policy error codes (errno):
+
+The following table lists the error codes that may appear in the errno field while loading or updating the policy:
+
++----------------+--------------------------------------------------------+
+| Error Code | Description |
++================+========================================================+
+| 0 | Success |
++----------------+--------------------------------------------------------+
+| -EPERM | Insufficient permission |
++----------------+--------------------------------------------------------+
+| -EEXIST | Same name policy already deployed |
++----------------+--------------------------------------------------------+
+| -EBADMSG | Policy is invalid |
++----------------+--------------------------------------------------------+
+| -ENOMEM | Out of memory (OOM) |
++----------------+--------------------------------------------------------+
+| -ERANGE | Policy version number overflow |
++----------------+--------------------------------------------------------+
+| -EINVAL | Policy version parsing error |
++----------------+--------------------------------------------------------+
+| -ENOKEY | Key used to sign the IPE policy not found in keyring |
++----------------+--------------------------------------------------------+
+| -EKEYREJECTED | Policy signature verification failed |
++----------------+--------------------------------------------------------+
+| -ESTALE | Attempting to update an IPE policy with older version |
++----------------+--------------------------------------------------------+
+| -ENOENT | Policy was deleted while updating |
++----------------+--------------------------------------------------------+
1404 AUDIT_MAC_STATUS
^^^^^^^^^^^^^^^^^^^^^
diff --git a/Documentation/admin-guide/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
index 957ccf617797..3262397ebe8f 100644
--- a/Documentation/admin-guide/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -11,6 +11,7 @@ Block Devices
nbd
paride
ramdisk
+ zoned_loop
zram
drbd/index
diff --git a/Documentation/admin-guide/blockdev/zoned_loop.rst b/Documentation/admin-guide/blockdev/zoned_loop.rst
new file mode 100644
index 000000000000..9c7aa3b482f3
--- /dev/null
+++ b/Documentation/admin-guide/blockdev/zoned_loop.rst
@@ -0,0 +1,169 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Zoned Loop Block Device
+=======================
+
+.. Contents:
+
+ 1) Overview
+ 2) Creating a Zoned Device
+ 3) Deleting a Zoned Device
+ 4) Example
+
+
+1) Overview
+-----------
+
+The zoned loop block device driver (zloop) allows a user to create a zoned block
+device using one regular file per zone as backing storage. This driver does not
+directly control any hardware and uses read, write and truncate operations to
+regular files of a file system to emulate a zoned block device.
+
+Using zloop, zoned block devices with a configurable capacity, zone size and
+number of conventional zones can be created. The storage for each zone of the
+device is implemented using a regular file with a maximum size equal to the zone
+size. The size of a file backing a conventional zone is always equal to the zone
+size. The size of a file backing a sequential zone indicates the amount of data
+sequentially written to the file, that is, the size of the file directly
+indicates the position of the write pointer of the zone.
+
+When resetting a sequential zone, its backing file size is truncated to zero.
+Conversely, for a zone finish operation, the backing file is truncated to the
+zone size. With this, the maximum capacity of a zloop zoned block device created
+can be larger configured to be larger than the storage space available on the
+backing file system. Of course, for such configuration, writing more data than
+the storage space available on the backing file system will result in write
+errors.
+
+The zoned loop block device driver implements a complete zone transition state
+machine. That is, zones can be empty, implicitly opened, explicitly opened,
+closed or full. The current implementation does not support any limits on the
+maximum number of open and active zones.
+
+No user tools are necessary to create and delete zloop devices.
+
+2) Creating a Zoned Device
+--------------------------
+
+Once the zloop module is loaded (or if zloop is compiled in the kernel), the
+character device file /dev/zloop-control can be used to add a zloop device.
+This is done by writing an "add" command directly to the /dev/zloop-control
+device::
+
+ $ modprobe zloop
+ $ ls -l /dev/zloop*
+ crw-------. 1 root root 10, 123 Jan 6 19:18 /dev/zloop-control
+
+ $ mkdir -p <base directory/<device ID>
+ $ echo "add [options]" > /dev/zloop-control
+
+The options available for the add command can be listed by reading the
+/dev/zloop-control device::
+
+ $ cat /dev/zloop-control
+ add id=%d,capacity_mb=%u,zone_size_mb=%u,zone_capacity_mb=%u,conv_zones=%u,base_dir=%s,nr_queues=%u,queue_depth=%u,buffered_io
+ remove id=%d
+
+In more details, the options that can be used with the "add" command are as
+follows.
+
+================ ===========================================================
+id Device number (the X in /dev/zloopX).
+ Default: automatically assigned.
+capacity_mb Device total capacity in MiB. This is always rounded up to
+ the nearest higher multiple of the zone size.
+ Default: 16384 MiB (16 GiB).
+zone_size_mb Device zone size in MiB. Default: 256 MiB.
+zone_capacity_mb Device zone capacity (must always be equal to or lower than
+ the zone size. Default: zone size.
+conv_zones Total number of conventioanl zones starting from sector 0.
+ Default: 8.
+base_dir Path to the base directoy where to create the directory
+ containing the zone files of the device.
+ Default=/var/local/zloop.
+ The device directory containing the zone files is always
+ named with the device ID. E.g. the default zone file
+ directory for /dev/zloop0 is /var/local/zloop/0.
+nr_queues Number of I/O queues of the zoned block device. This value is
+ always capped by the number of online CPUs
+ Default: 1
+queue_depth Maximum I/O queue depth per I/O queue.
+ Default: 64
+buffered_io Do buffered IOs instead of direct IOs (default: false)
+================ ===========================================================
+
+3) Deleting a Zoned Device
+--------------------------
+
+Deleting an unused zoned loop block device is done by issuing the "remove"
+command to /dev/zloop-control, specifying the ID of the device to remove::
+
+ $ echo "remove id=X" > /dev/zloop-control
+
+The remove command does not have any option.
+
+A zoned device that was removed can be re-added again without any change to the
+state of the device zones: the device zones are restored to their last state
+before the device was removed. Adding again a zoned device after it was removed
+must always be done using the same configuration as when the device was first
+added. If a zone configuration change is detected, an error will be returned and
+the zoned device will not be created.
+
+To fully delete a zoned device, after executing the remove operation, the device
+base directory containing the backing files of the device zones must be deleted.
+
+4) Example
+----------
+
+The following sequence of commands creates a 2GB zoned device with zones of 64
+MB and a zone capacity of 63 MB::
+
+ $ modprobe zloop
+ $ mkdir -p /var/local/zloop/0
+ $ echo "add capacity_mb=2048,zone_size_mb=64,zone_capacity=63MB" > /dev/zloop-control
+
+For the device created (/dev/zloop0), the zone backing files are all created
+under the default base directory (/var/local/zloop)::
+
+ $ ls -l /var/local/zloop/0
+ total 0
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000000
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000001
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000002
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000003
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000004
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000005
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000006
+ -rw-------. 1 root root 67108864 Jan 6 22:23 cnv-000007
+ -rw-------. 1 root root 0 Jan 6 22:23 seq-000008
+ -rw-------. 1 root root 0 Jan 6 22:23 seq-000009
+ ...
+
+The zoned device created (/dev/zloop0) can then be used normally::
+
+ $ lsblk -z
+ NAME ZONED ZONE-SZ ZONE-NR ZONE-AMAX ZONE-OMAX ZONE-APP ZONE-WGRAN
+ zloop0 host-managed 64M 32 0 0 1M 4K
+ $ blkzone report /dev/zloop0
+ start: 0x000000000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x000020000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x000040000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x000060000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x000080000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x0000a0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x0000c0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x0000e0000, len 0x020000, cap 0x020000, wptr 0x000000 reset:0 non-seq:0, zcond: 0(nw) [type: 1(CONVENTIONAL)]
+ start: 0x000100000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
+ start: 0x000120000, len 0x020000, cap 0x01f800, wptr 0x000000 reset:0 non-seq:0, zcond: 1(em) [type: 2(SEQ_WRITE_REQUIRED)]
+ ...
+
+Deleting this device is done using the command::
+
+ $ echo "remove id=0" > /dev/zloop-control
+
+The removed device can be re-added again using the same "add" command as when
+the device was first created. To fully delete a zoned device, its backing files
+should also be deleted after executing the remove command::
+
+ $ rm -r /var/local/zloop/0
diff --git a/Documentation/admin-guide/bug-hunting.rst b/Documentation/admin-guide/bug-hunting.rst
index ce6f4e8ca487..30858757c9f2 100644
--- a/Documentation/admin-guide/bug-hunting.rst
+++ b/Documentation/admin-guide/bug-hunting.rst
@@ -196,7 +196,7 @@ will see the assembler code for the routine shown, but if your kernel has
debug symbols the C code will also be available. (Debug symbols can be enabled
in the kernel hacking menu of the menu configuration.) For example::
- $ objdump -r -S -l --disassemble net/dccp/ipv4.o
+ $ objdump -r -S -l --disassemble net/ipv4/tcp.o
.. note::
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 1a16ce68a4d7..1edc26622594 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1076,7 +1076,7 @@ cpufreq governor about the minimum desired frequency which should always be
provided by a CPU, as well as the maximum desired frequency, which should not
be exceeded by a CPU.
-WARNING: cgroup2 cpu controller doesn't yet fully support the control of
+WARNING: cgroup2 cpu controller doesn't yet support the (bandwidth) control of
realtime processes. For a kernel built with the CONFIG_RT_GROUP_SCHED option
enabled for group scheduling of realtime processes, the cpu controller can only
be enabled when all RT processes are in the root cgroup. Be aware that system
@@ -1095,19 +1095,34 @@ realtime processes irrespective of CONFIG_RT_GROUP_SCHED.
CPU Interface Files
~~~~~~~~~~~~~~~~~~~
-All time durations are in microseconds.
+The interaction of a process with the cpu controller depends on its scheduling
+policy and the underlying scheduler. From the point of view of the cpu controller,
+processes can be categorized as follows:
+
+* Processes under the fair-class scheduler
+* Processes under a BPF scheduler with the ``cgroup_set_weight`` callback
+* Everything else: ``SCHED_{FIFO,RR,DEADLINE}`` and processes under a BPF scheduler
+ without the ``cgroup_set_weight`` callback
+
+For details on when a process is under the fair-class scheduler or a BPF scheduler,
+check out :ref:`Documentation/scheduler/sched-ext.rst <sched-ext>`.
+
+For each of the following interface files, the above categories
+will be referred to. All time durations are in microseconds.
cpu.stat
A read-only flat-keyed file.
This file exists whether the controller is enabled or not.
- It always reports the following three stats:
+ It always reports the following three stats, which account for all the
+ processes in the cgroup:
- usage_usec
- user_usec
- system_usec
- and the following five when the controller is enabled:
+ and the following five when the controller is enabled, which account for
+ only the processes under the fair-class scheduler:
- nr_periods
- nr_throttled
@@ -1125,6 +1140,10 @@ All time durations are in microseconds.
If the cgroup has been configured to be SCHED_IDLE (cpu.idle = 1),
then the weight will show as a 0.
+ This file affects only processes under the fair-class scheduler and a BPF
+ scheduler with the ``cgroup_set_weight`` callback depending on what the
+ callback actually does.
+
cpu.weight.nice
A read-write single value file which exists on non-root
cgroups. The default is "0".
@@ -1137,6 +1156,10 @@ All time durations are in microseconds.
granularity is coarser for the nice values, the read value is
the closest approximation of the current weight.
+ This file affects only processes under the fair-class scheduler and a BPF
+ scheduler with the ``cgroup_set_weight`` callback depending on what the
+ callback actually does.
+
cpu.max
A read-write two value file which exists on non-root cgroups.
The default is "max 100000".
@@ -1149,43 +1172,55 @@ All time durations are in microseconds.
$PERIOD duration. "max" for $MAX indicates no limit. If only
one number is written, $MAX is updated.
+ This file affects only processes under the fair-class scheduler.
+
cpu.max.burst
A read-write single value file which exists on non-root
cgroups. The default is "0".
The burst in the range [0, $MAX].
+ This file affects only processes under the fair-class scheduler.
+
cpu.pressure
A read-write nested-keyed file.
Shows pressure stall information for CPU. See
:ref:`Documentation/accounting/psi.rst <psi>` for details.
+ This file accounts for all the processes in the cgroup.
+
cpu.uclamp.min
- A read-write single value file which exists on non-root cgroups.
- The default is "0", i.e. no utilization boosting.
+ A read-write single value file which exists on non-root cgroups.
+ The default is "0", i.e. no utilization boosting.
+
+ The requested minimum utilization (protection) as a percentage
+ rational number, e.g. 12.34 for 12.34%.
- The requested minimum utilization (protection) as a percentage
- rational number, e.g. 12.34 for 12.34%.
+ This interface allows reading and setting minimum utilization clamp
+ values similar to the sched_setattr(2). This minimum utilization
+ value is used to clamp the task specific minimum utilization clamp,
+ including those of realtime processes.
- This interface allows reading and setting minimum utilization clamp
- values similar to the sched_setattr(2). This minimum utilization
- value is used to clamp the task specific minimum utilization clamp.
+ The requested minimum utilization (protection) is always capped by
+ the current value for the maximum utilization (limit), i.e.
+ `cpu.uclamp.max`.
- The requested minimum utilization (protection) is always capped by
- the current value for the maximum utilization (limit), i.e.
- `cpu.uclamp.max`.
+ This file affects all the processes in the cgroup.
cpu.uclamp.max
- A read-write single value file which exists on non-root cgroups.
- The default is "max". i.e. no utilization capping
+ A read-write single value file which exists on non-root cgroups.
+ The default is "max". i.e. no utilization capping
+
+ The requested maximum utilization (limit) as a percentage rational
+ number, e.g. 98.76 for 98.76%.
- The requested maximum utilization (limit) as a percentage rational
- number, e.g. 98.76 for 98.76%.
+ This interface allows reading and setting maximum utilization clamp
+ values similar to the sched_setattr(2). This maximum utilization
+ value is used to clamp the task specific maximum utilization clamp,
+ including those of realtime processes.
- This interface allows reading and setting maximum utilization clamp
- values similar to the sched_setattr(2). This maximum utilization
- value is used to clamp the task specific maximum utilization clamp.
+ This file affects all the processes in the cgroup.
cpu.idle
A read-write single value file which exists on non-root cgroups.
@@ -1197,7 +1232,7 @@ All time durations are in microseconds.
own relative priorities, but the cgroup itself will be treated as
very low priority relative to its peers.
-
+ This file affects only processes under the fair-class scheduler.
Memory
------
@@ -3019,7 +3054,7 @@ Filesystem Support for Writeback
--------------------------------
A filesystem can support cgroup writeback by updating
-address_space_operations->writepage[s]() to annotate bio's using the
+address_space_operations->writepages() to annotate bio's using the
following two functions.
wbc_init_bio(@wbc, @bio)
diff --git a/Documentation/admin-guide/gpio/gpio-aggregator.rst b/Documentation/admin-guide/gpio/gpio-aggregator.rst
index 5cd1e7221756..8374a9df9105 100644
--- a/Documentation/admin-guide/gpio/gpio-aggregator.rst
+++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst
@@ -69,6 +69,113 @@ write-only attribute files in sysfs.
$ echo gpio-aggregator.0 > delete_device
+Aggregating GPIOs using Configfs
+--------------------------------
+
+**Group:** ``/config/gpio-aggregator``
+
+ This is the root directory of the gpio-aggregator configfs tree.
+
+**Group:** ``/config/gpio-aggregator/<example-name>``
+
+ This directory represents a GPIO aggregator device. You can assign any
+ name to ``<example-name>`` (e.g. ``agg0``), except names starting with
+ ``_sysfs`` prefix, which are reserved for auto-generated configfs
+ entries corresponding to devices created via Sysfs.
+
+**Attribute:** ``/config/gpio-aggregator/<example-name>/live``
+
+ The ``live`` attribute allows to trigger the actual creation of the device
+ once it's fully configured. Accepted values are:
+
+ * ``1``, ``yes``, ``true`` : enable the virtual device
+ * ``0``, ``no``, ``false`` : disable the virtual device
+
+**Attribute:** ``/config/gpio-aggregator/<example-name>/dev_name``
+
+ The read-only ``dev_name`` attribute exposes the name of the device as it
+ will appear in the system on the platform bus (e.g. ``gpio-aggregator.0``).
+ This is useful for identifying a character device for the newly created
+ aggregator. If it's ``gpio-aggregator.0``,
+ ``/sys/devices/platform/gpio-aggregator.0/gpiochipX`` path tells you that the
+ GPIO device id is ``X``.
+
+You must create subdirectories for each virtual line you want to
+instantiate, named exactly as ``line0``, ``line1``, ..., ``lineY``, when
+you want to instantiate ``Y+1`` (Y >= 0) lines. Configure all lines before
+activating the device by setting ``live`` to 1.
+
+**Group:** ``/config/gpio-aggregator/<example-name>/<lineY>/``
+
+ This directory represents a GPIO line to include in the aggregator.
+
+**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/key``
+
+**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/offset``
+
+ The default values after creating the ``<lineY>`` directory are:
+
+ * ``key`` : <empty>
+ * ``offset`` : -1
+
+ ``key`` must always be explicitly configured, while ``offset`` depends.
+ Two configuration patterns exist for each ``<lineY>``:
+
+ (a). For lookup by GPIO line name:
+
+ * Set ``key`` to the line name.
+ * Ensure ``offset`` remains -1 (the default).
+
+ (b). For lookup by GPIO chip name and the line offset within the chip:
+
+ * Set ``key`` to the chip name.
+ * Set ``offset`` to the line offset (0 <= ``offset`` < 65535).
+
+**Attribute:** ``/config/gpio-aggregator/<example-name>/<lineY>/name``
+
+ The ``name`` attribute sets a custom name for lineY. If left unset, the
+ line will remain unnamed.
+
+Once the configuration is done, the ``'live'`` attribute must be set to 1
+in order to instantiate the aggregator device. It can be set back to 0 to
+destroy the virtual device. The module will synchronously wait for the new
+aggregator device to be successfully probed and if this doesn't happen, writing
+to ``'live'`` will result in an error. This is a different behaviour from the
+case when you create it using sysfs ``new_device`` interface.
+
+.. note::
+
+ For aggregators created via Sysfs, the configfs entries are
+ auto-generated and appear as ``/config/gpio-aggregator/_sysfs.<N>/``. You
+ cannot add or remove line directories with mkdir(2)/rmdir(2). To modify
+ lines, you must use the "delete_device" interface to tear down the
+ existing device and reconfigure it from scratch. However, you can still
+ toggle the aggregator with the ``live`` attribute and adjust the
+ ``key``, ``offset``, and ``name`` attributes for each line when ``live``
+ is set to 0 by hand (i.e. it's not waiting for deferred probe).
+
+Sample configuration commands
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code-block:: sh
+
+ # Create a directory for an aggregator device
+ $ mkdir /sys/kernel/config/gpio-aggregator/agg0
+
+ # Configure each line
+ $ mkdir /sys/kernel/config/gpio-aggregator/agg0/line0
+ $ echo gpiochip0 > /sys/kernel/config/gpio-aggregator/agg0/line0/key
+ $ echo 6 > /sys/kernel/config/gpio-aggregator/agg0/line0/offset
+ $ echo test0 > /sys/kernel/config/gpio-aggregator/agg0/line0/name
+ $ mkdir /sys/kernel/config/gpio-aggregator/agg0/line1
+ $ echo gpiochip0 > /sys/kernel/config/gpio-aggregator/agg0/line1/key
+ $ echo 7 > /sys/kernel/config/gpio-aggregator/agg0/line1/offset
+ $ echo test1 > /sys/kernel/config/gpio-aggregator/agg0/line1/name
+
+ # Activate the aggregator device
+ $ echo 1 > /sys/kernel/config/gpio-aggregator/agg0/live
+
+
Generic GPIO Driver
-------------------
diff --git a/Documentation/admin-guide/hw-vuln/index.rst b/Documentation/admin-guide/hw-vuln/index.rst
index 451874b8135d..09890a8f3ee9 100644
--- a/Documentation/admin-guide/hw-vuln/index.rst
+++ b/Documentation/admin-guide/hw-vuln/index.rst
@@ -23,3 +23,5 @@ are configurable at compile, boot or run time.
gather_data_sampling
reg-file-data-sampling
rsb
+ old_microcode
+ indirect-target-selection
diff --git a/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst b/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
new file mode 100644
index 000000000000..d9ca64108d23
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
@@ -0,0 +1,168 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Indirect Target Selection (ITS)
+===============================
+
+ITS is a vulnerability in some Intel CPUs that support Enhanced IBRS and were
+released before Alder Lake. ITS may allow an attacker to control the prediction
+of indirect branches and RETs located in the lower half of a cacheline.
+
+ITS is assigned CVE-2024-28956 with a CVSS score of 4.7 (Medium).
+
+Scope of Impact
+---------------
+- **eIBRS Guest/Host Isolation**: Indirect branches in KVM/kernel may still be
+ predicted with unintended target corresponding to a branch in the guest.
+
+- **Intra-Mode BTI**: In-kernel training such as through cBPF or other native
+ gadgets.
+
+- **Indirect Branch Prediction Barrier (IBPB)**: After an IBPB, indirect
+ branches may still be predicted with targets corresponding to direct branches
+ executed prior to the IBPB. This is fixed by the IPU 2025.1 microcode, which
+ should be available via distro updates. Alternatively microcode can be
+ obtained from Intel's github repository [#f1]_.
+
+Affected CPUs
+-------------
+Below is the list of ITS affected CPUs [#f2]_ [#f3]_:
+
+ ======================== ============ ==================== ===============
+ Common name Family_Model eIBRS Intra-mode BTI
+ Guest/Host Isolation
+ ======================== ============ ==================== ===============
+ SKYLAKE_X (step >= 6) 06_55H Affected Affected
+ ICELAKE_X 06_6AH Not affected Affected
+ ICELAKE_D 06_6CH Not affected Affected
+ ICELAKE_L 06_7EH Not affected Affected
+ TIGERLAKE_L 06_8CH Not affected Affected
+ TIGERLAKE 06_8DH Not affected Affected
+ KABYLAKE_L (step >= 12) 06_8EH Affected Affected
+ KABYLAKE (step >= 13) 06_9EH Affected Affected
+ COMETLAKE 06_A5H Affected Affected
+ COMETLAKE_L 06_A6H Affected Affected
+ ROCKETLAKE 06_A7H Not affected Affected
+ ======================== ============ ==================== ===============
+
+- All affected CPUs enumerate Enhanced IBRS feature.
+- IBPB isolation is affected on all ITS affected CPUs, and need a microcode
+ update for mitigation.
+- None of the affected CPUs enumerate BHI_CTRL which was introduced in Golden
+ Cove (Alder Lake and Sapphire Rapids). This can help guests to determine the
+ host's affected status.
+- Intel Atom CPUs are not affected by ITS.
+
+Mitigation
+----------
+As only the indirect branches and RETs that have their last byte of instruction
+in the lower half of the cacheline are vulnerable to ITS, the basic idea behind
+the mitigation is to not allow indirect branches in the lower half.
+
+This is achieved by relying on existing retpoline support in the kernel, and in
+compilers. ITS-vulnerable retpoline sites are runtime patched to point to newly
+added ITS-safe thunks. These safe thunks consists of indirect branch in the
+second half of the cacheline. Not all retpoline sites are patched to thunks, if
+a retpoline site is evaluated to be ITS-safe, it is replaced with an inline
+indirect branch.
+
+Dynamic thunks
+~~~~~~~~~~~~~~
+From a dynamically allocated pool of safe-thunks, each vulnerable site is
+replaced with a new thunk, such that they get a unique address. This could
+improve the branch prediction accuracy. Also, it is a defense-in-depth measure
+against aliasing.
+
+Note, for simplicity, indirect branches in eBPF programs are always replaced
+with a jump to a static thunk in __x86_indirect_its_thunk_array. If required,
+in future this can be changed to use dynamic thunks.
+
+All vulnerable RETs are replaced with a static thunk, they do not use dynamic
+thunks. This is because RETs get their prediction from RSB mostly that does not
+depend on source address. RETs that underflow RSB may benefit from dynamic
+thunks. But, RETs significantly outnumber indirect branches, and any benefit
+from a unique source address could be outweighed by the increased icache
+footprint and iTLB pressure.
+
+Retpoline
+~~~~~~~~~
+Retpoline sequence also mitigates ITS-unsafe indirect branches. For this
+reason, when retpoline is enabled, ITS mitigation only relocates the RETs to
+safe thunks. Unless user requested the RSB-stuffing mitigation.
+
+RSB Stuffing
+~~~~~~~~~~~~
+RSB-stuffing via Call Depth Tracking is a mitigation for Retbleed RSB-underflow
+attacks. And it also mitigates RETs that are vulnerable to ITS.
+
+Mitigation in guests
+^^^^^^^^^^^^^^^^^^^^
+All guests deploy ITS mitigation by default, irrespective of eIBRS enumeration
+and Family/Model of the guest. This is because eIBRS feature could be hidden
+from a guest. One exception to this is when a guest enumerates BHI_DIS_S, which
+indicates that the guest is running on an unaffected host.
+
+To prevent guests from unnecessarily deploying the mitigation on unaffected
+platforms, Intel has defined ITS_NO bit(62) in MSR IA32_ARCH_CAPABILITIES. When
+a guest sees this bit set, it should not enumerate the ITS bug. Note, this bit
+is not set by any hardware, but is **intended for VMMs to synthesize** it for
+guests as per the host's affected status.
+
+Mitigation options
+^^^^^^^^^^^^^^^^^^
+The ITS mitigation can be controlled using the "indirect_target_selection"
+kernel parameter. The available options are:
+
+ ======== ===================================================================
+ on (default) Deploy the "Aligned branch/return thunks" mitigation.
+ If spectre_v2 mitigation enables retpoline, aligned-thunks are only
+ deployed for the affected RET instructions. Retpoline mitigates
+ indirect branches.
+
+ off Disable ITS mitigation.
+
+ vmexit Equivalent to "=on" if the CPU is affected by guest/host isolation
+ part of ITS. Otherwise, mitigation is not deployed. This option is
+ useful when host userspace is not in the threat model, and only
+ attacks from guest to host are considered.
+
+ stuff Deploy RSB-fill mitigation when retpoline is also deployed.
+ Otherwise, deploy the default mitigation. When retpoline mitigation
+ is enabled, RSB-stuffing via Call-Depth-Tracking also mitigates
+ ITS.
+
+ force Force the ITS bug and deploy the default mitigation.
+ ======== ===================================================================
+
+Sysfs reporting
+---------------
+
+The sysfs file showing ITS mitigation status is:
+
+ /sys/devices/system/cpu/vulnerabilities/indirect_target_selection
+
+Note, microcode mitigation status is not reported in this file.
+
+The possible values in this file are:
+
+.. list-table::
+
+ * - Not affected
+ - The processor is not vulnerable.
+ * - Vulnerable
+ - System is vulnerable and no mitigation has been applied.
+ * - Vulnerable, KVM: Not affected
+ - System is vulnerable to intra-mode BTI, but not affected by eIBRS
+ guest/host isolation.
+ * - Mitigation: Aligned branch/return thunks
+ - The mitigation is enabled, affected indirect branches and RETs are
+ relocated to safe thunks.
+ * - Mitigation: Retpolines, Stuffing RSB
+ - The mitigation is enabled using retpoline and RSB stuffing.
+
+References
+----------
+.. [#f1] Microcode repository - https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files
+
+.. [#f2] Affected Processors list - https://www.intel.com/content/www/us/en/developer/topic-technology/software-security-guidance/processors-affected-consolidated-product-cpu-model.html
+
+.. [#f3] Affected Processors list (machine readable) - https://github.com/intel/Intel-affected-processor-list
diff --git a/Documentation/admin-guide/hw-vuln/old_microcode.rst b/Documentation/admin-guide/hw-vuln/old_microcode.rst
new file mode 100644
index 000000000000..6ded8f86b8d0
--- /dev/null
+++ b/Documentation/admin-guide/hw-vuln/old_microcode.rst
@@ -0,0 +1,21 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=============
+Old Microcode
+=============
+
+The kernel keeps a table of released microcode. Systems that had
+microcode older than this at boot will say "Vulnerable". This means
+that the system was vulnerable to some known CPU issue. It could be
+security or functional, the kernel does not know or care.
+
+You should update the CPU microcode to mitigate any exposure. This is
+usually accomplished by updating the files in
+/lib/firmware/intel-ucode/ via normal distribution updates. Intel also
+distributes these files in a github repo:
+
+ https://github.com/intel/Intel-Linux-Processor-Microcode-Data-Files.git
+
+Just like all the other hardware vulnerabilities, exposure is
+determined at boot. Runtime microcode updates do not change the status
+of this vulnerability.
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d9fd26b95b34..ea81784be981 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1828,6 +1828,13 @@
lz4: Select LZ4 compression algorithm to
compress/decompress hibernation image.
+ hibernate.pm_test_delay=
+ [HIBERNATION]
+ Sets the number of seconds to remain in a hibernation test
+ mode before resuming the system (see
+ /sys/power/pm_test). Only available when CONFIG_PM_DEBUG
+ is set. Default value is 5.
+
highmem=nn[KMG] [KNL,BOOT,EARLY] forces the highmem zone to have an exact
size of <nn>. This works even on boxes that have no
highmem otherwise. This also works to reduce highmem
@@ -2202,6 +2209,23 @@
different crypto accelerators. This option can be used
to achieve best performance for particular HW.
+ indirect_target_selection= [X86,Intel] Mitigation control for Indirect
+ Target Selection(ITS) bug in Intel CPUs. Updated
+ microcode is also required for a fix in IBPB.
+
+ on: Enable mitigation (default).
+ off: Disable mitigation.
+ force: Force the ITS bug and deploy default
+ mitigation.
+ vmexit: Only deploy mitigation if CPU is affected by
+ guest/host isolation part of ITS.
+ stuff: Deploy RSB-fill mitigation when retpoline is
+ also deployed. Otherwise, deploy the default
+ mitigation.
+
+ For details see:
+ Documentation/admin-guide/hw-vuln/indirect-target-selection.rst
+
init= [KNL]
Format: <full_path>
Run specified binary instead of /sbin/init as init
@@ -3693,6 +3717,7 @@
expose users to several CPU vulnerabilities.
Equivalent to: if nokaslr then kpti=0 [ARM64]
gather_data_sampling=off [X86]
+ indirect_target_selection=off [X86]
kvm.nx_huge_pages=off [X86]
l1tf=off [X86]
mds=off [X86]
@@ -5654,6 +5679,31 @@
are zero, rcutorture acts as if is interpreted
they are all non-zero.
+ rcutorture.gpwrap_lag= [KNL]
+ Enable grace-period wrap lag testing. Setting
+ to false prevents the gpwrap lag test from
+ running. Default is true.
+
+ rcutorture.gpwrap_lag_gps= [KNL]
+ Set the value for grace-period wrap lag during
+ active lag testing periods. This controls how many
+ grace periods differences we tolerate between
+ rdp and rnp's gp_seq before setting overflow flag.
+ The default is always set to 8.
+
+ rcutorture.gpwrap_lag_cycle_mins= [KNL]
+ Set the total cycle duration for gpwrap lag
+ testing in minutes. This is the total time for
+ one complete cycle of active and inactive
+ testing periods. Default is 30 minutes.
+
+ rcutorture.gpwrap_lag_active_mins= [KNL]
+ Set the duration for which gpwrap lag is active
+ within each cycle, in minutes. During this time,
+ the grace-period wrap lag will be set to the
+ value specified by gpwrap_lag_gps. Default is
+ 5 minutes.
+
rcutorture.irqreader= [KNL]
Run RCU readers from irq handlers, or, more
accurately, from a timer handler. Not all RCU
@@ -6250,7 +6300,7 @@
port and the regular usb controller gets disabled.
root= [KNL] Root filesystem
- Usually this a a block device specifier of some kind,
+ Usually this is a block device specifier of some kind,
see the early_lookup_bdev comment in
block/early-lookup.c for details.
Alternatively this can be "ram" for the legacy initial
@@ -6277,6 +6327,11 @@
Memory area to be used by remote processor image,
managed by CMA.
+ rt_group_sched= [KNL] Enable or disable SCHED_RR/FIFO group scheduling
+ when CONFIG_RT_GROUP_SCHED=y. Defaults to
+ !CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED.
+ Format: <bool>
+
rw [KNL] Mount root device read-write on boot
S [KNL] Run init in single mode
diff --git a/Documentation/admin-guide/laptops/alienware-wmi.rst b/Documentation/admin-guide/laptops/alienware-wmi.rst
new file mode 100644
index 000000000000..27a32a8057da
--- /dev/null
+++ b/Documentation/admin-guide/laptops/alienware-wmi.rst
@@ -0,0 +1,127 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+
+====================
+Alienware WMI Driver
+====================
+
+Kurt Borja <kuurtb@gmail.com>
+
+This is a driver for the "WMAX" WMI device, which is found in most Dell gaming
+laptops and controls various special features.
+
+Before the launch of M-Series laptops (~2018), the "WMAX" device controlled
+basic RGB lighting, deep sleep mode, HDMI mode and amplifier status.
+
+Later, this device was completely repurpused. Now it mostly deals with thermal
+profiles, sensor monitoring and overclocking. This interface is named "AWCC" and
+is known to be used by the AWCC OEM application to control these features.
+
+The alienware-wmi driver controls both interfaces.
+
+AWCC Interface
+==============
+
+WMI device documentation: Documentation/wmi/devices/alienware-wmi.rst
+
+Supported devices
+-----------------
+
+- Alienware M-Series laptops
+- Alienware X-Series laptops
+- Alienware Aurora Desktops
+- Dell G-Series laptops
+
+If you believe your device supports the AWCC interface and you don't have any of
+the features described in this document, try the following alienware-wmi module
+parameters:
+
+- ``force_platform_profile=1``: Forces probing for platform profile support
+- ``force_hwmon=1``: Forces probing for HWMON support
+
+If the module loads successfully with these parameters, consider submitting a
+patch adding your model to the ``awcc_dmi_table`` located in
+``drivers/platform/x86/dell/alienware-wmi-wmax.c`` or contacting the maintainer
+for further guidance.
+
+Status
+------
+
+The following features are currently supported:
+
+- :ref:`Platform Profile <platform-profile>`:
+
+ - Thermal profile control
+
+ - G-Mode toggling
+
+- :ref:`HWMON <hwmon>`:
+
+ - Sensor monitoring
+
+ - Manual fan control
+
+.. _platform-profile:
+
+Platform Profile
+----------------
+
+The AWCC interface exposes various firmware defined thermal profiles. These are
+exposed to user-space through the Platform Profile class interface. Refer to
+:ref:`sysfs-class-platform-profile <abi_file_testing_sysfs_class_platform_profile>`
+for more information.
+
+The name of the platform-profile class device exported by this driver is
+"alienware-wmi" and it's path can be found with:
+
+::
+
+ grep -l "alienware-wmi" /sys/class/platform-profile/platform-profile-*/name | sed 's|/[^/]*$||'
+
+If the device supports G-Mode, it is also toggled when selecting the
+``performance`` profile.
+
+.. note::
+ You may set the ``force_gmode`` module parameter to always try to toggle this
+ feature, without checking if your model supports it.
+
+.. _hwmon:
+
+HWMON
+-----
+
+The AWCC interface also supports sensor monitoring and manual fan control. Both
+of these features are exposed to user-space through the HWMON interface.
+
+The name of the hwmon class device exported by this driver is "alienware_wmi"
+and it's path can be found with:
+
+::
+
+ grep -l "alienware_wmi" /sys/class/hwmon/hwmon*/name | sed 's|/[^/]*$||'
+
+Sensor monitoring is done through the standard HWMON interface. Refer to
+:ref:`sysfs-class-hwmon <abi_file_testing_sysfs_class_hwmon>` for more
+information.
+
+Manual fan control on the other hand, is not exposed directly by the AWCC
+interface. Instead it let's us control a fan `boost` value. This `boost` value
+has the following aproximate behavior over the fan pwm:
+
+::
+
+ pwm = pwm_base + (fan_boost / 255) * (pwm_max - pwm_base)
+
+Due to the above behavior, the fan `boost` control is exposed to user-space
+through the following, custom hwmon sysfs attribute:
+
+=============================== ======= =======================================
+Name Perm Description
+=============================== ======= =======================================
+fan[1-4]_boost RW Fan boost value.
+
+ Integer value between 0 and 255
+=============================== ======= =======================================
+
+.. note::
+ In some devices, manual fan control only works reliably if the ``custom``
+ platform profile is selected.
diff --git a/Documentation/admin-guide/laptops/index.rst b/Documentation/admin-guide/laptops/index.rst
index e71c8984c23e..db842b629303 100644
--- a/Documentation/admin-guide/laptops/index.rst
+++ b/Documentation/admin-guide/laptops/index.rst
@@ -7,6 +7,7 @@ Laptop Drivers
.. toctree::
:maxdepth: 1
+ alienware-wmi
asus-laptop
disk-shock-protection
laptop-mode
diff --git a/Documentation/admin-guide/media/c3-isp.dot b/Documentation/admin-guide/media/c3-isp.dot
new file mode 100644
index 000000000000..42dc931ee84a
--- /dev/null
+++ b/Documentation/admin-guide/media/c3-isp.dot
@@ -0,0 +1,26 @@
+digraph board {
+ rankdir=TB
+ n00000001 [label="{{<port0> 0 | <port1> 1} | c3-isp-core\n/dev/v4l-subdev0 | {<port2> 2 | <port3> 3 | <port4> 4 | <port5> 5}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000001:port3 -> n00000008:port0
+ n00000001:port4 -> n0000000b:port0
+ n00000001:port5 -> n0000000e:port0
+ n00000001:port2 -> n00000027
+ n00000008 [label="{{<port0> 0} | c3-isp-resizer0\n/dev/v4l-subdev1 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000008:port1 -> n00000016 [style=bold]
+ n0000000b [label="{{<port0> 0} | c3-isp-resizer1\n/dev/v4l-subdev2 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000b:port1 -> n0000001a [style=bold]
+ n0000000e [label="{{<port0> 0} | c3-isp-resizer2\n/dev/v4l-subdev3 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000000e:port1 -> n00000023 [style=bold]
+ n00000011 [label="{{<port0> 0} | c3-mipi-adapter\n/dev/v4l-subdev4 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n00000011:port1 -> n00000001:port0 [style=bold]
+ n00000016 [label="c3-isp-cap0\n/dev/video0", shape=box, style=filled, fillcolor=yellow]
+ n0000001a [label="c3-isp-cap1\n/dev/video1", shape=box, style=filled, fillcolor=yellow]
+ n0000001e [label="{{<port0> 0} | c3-mipi-csi2\n/dev/v4l-subdev5 | {<port1> 1}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000001e:port1 -> n00000011:port0 [style=bold]
+ n00000023 [label="c3-isp-cap2\n/dev/video2", shape=box, style=filled, fillcolor=yellow]
+ n00000027 [label="c3-isp-stats\n/dev/video3", shape=box, style=filled, fillcolor=yellow]
+ n0000002b [label="c3-isp-params\n/dev/video4", shape=box, style=filled, fillcolor=yellow]
+ n0000002b -> n00000001:port1
+ n0000003f [label="{{} | imx290 2-001a\n/dev/v4l-subdev6 | {<port0> 0}}", shape=Mrecord, style=filled, fillcolor=green]
+ n0000003f:port0 -> n0000001e:port0 [style=bold]
+}
diff --git a/Documentation/admin-guide/media/c3-isp.rst b/Documentation/admin-guide/media/c3-isp.rst
new file mode 100644
index 000000000000..ac508b8c6831
--- /dev/null
+++ b/Documentation/admin-guide/media/c3-isp.rst
@@ -0,0 +1,101 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR MIT)
+
+.. include:: <isonum.txt>
+
+=================================================
+Amlogic C3 Image Signal Processing (C3ISP) driver
+=================================================
+
+Introduction
+============
+
+This file documents the Amlogic C3ISP driver located under
+drivers/media/platform/amlogic/c3/isp.
+
+The current version of the driver supports the C3ISP found on
+Amlogic C308L processor.
+
+The driver implements V4L2, Media controller and V4L2 subdev interfaces.
+Camera sensor using V4L2 subdev interface in the kernel is supported.
+
+The driver has been tested on AW419-C308L-Socket platform.
+
+Amlogic C3 ISP
+==============
+
+The Camera hardware found on C308L processors and supported by
+the driver consists of:
+
+- 1 MIPI-CSI-2 module: handles the physical layer of the MIPI CSI-2 receiver and
+ receives data from the connected camera sensor.
+- 1 MIPI-ADAPTER module: organizes MIPI data to meet ISP input requirements and
+ send MIPI data to ISP.
+- 1 ISP (Image Signal Processing) module: contains a pipeline of image processing
+ hardware blocks. The ISP pipeline contains three resizers at the end each of
+ them connected to a DMA interface which writes the output data to memory.
+
+A high-level functional view of the C3 ISP is presented below.::
+
+ +----------+ +-------+
+ | Resizer |--->| WRMIF |
+ +---------+ +------------+ +--------------+ +-------+ |----------+ +-------+
+ | Sensor |--->| MIPI CSI-2 |--->| MIPI ADAPTER |--->| ISP |---|----------+ +-------+
+ +---------+ +------------+ +--------------+ +-------+ | Resizer |--->| WRMIF |
+ +----------+ +-------+
+ |----------+ +-------+
+ | Resizer |--->| WRMIF |
+ +----------+ +-------+
+
+Driver architecture and design
+==============================
+
+With the goal to model the hardware links between the modules and to expose a
+clean, logical and usable interface, the driver registers the following V4L2
+sub-devices:
+
+- 1 `c3-mipi-csi2` sub-device - the MIPI CSI-2 receiver
+- 1 `c3-mipi-adapter` sub-device - the MIPI adapter
+- 1 `c3-isp-core` sub-device - the ISP core
+- 3 `c3-isp-resizer` sub-devices - the ISP resizers
+
+The `c3-isp-core` sub-device is linked to 2 video device nodes for statistics
+capture and parameters programming:
+
+- the `c3-isp-stats` capture video device node for statistics capture
+- the `c3-isp-params` output video device for parameters programming
+
+Each `c3-isp-resizer` sub-device is linked to a capture video device node where
+frames are captured from:
+
+- `c3-isp-resizer0` is linked to the `c3-isp-cap0` capture video device
+- `c3-isp-resizer1` is linked to the `c3-isp-cap1` capture video device
+- `c3-isp-resizer2` is linked to the `c3-isp-cap2` capture video device
+
+The media controller pipeline graph is as follows (with connected a
+IMX290 camera sensor):
+
+.. _isp_topology_graph:
+
+.. kernel-figure:: c3-isp.dot
+ :alt: c3-isp.dot
+ :align: center
+
+ Media pipeline topology
+
+Implementation
+==============
+
+Runtime configuration of the ISP hardware is performed on the `c3-isp-params`
+video device node using the :ref:`V4L2_META_FMT_C3ISP_PARAMS
+<v4l2-meta-fmt-c3isp-params>` as data format. The buffer structure is defined by
+:c:type:`c3_isp_params_cfg`.
+
+Statistics are captured from the `c3-isp-stats` video device node using the
+:ref:`V4L2_META_FMT_C3ISP_STATS <v4l2-meta-fmt-c3isp-stats>` data format.
+
+The final picture size and format is configured using the V4L2 video
+capture interface on the `c3-isp-cap[0, 2]` video device nodes.
+
+The Amlogic C3 ISP is supported by `libcamera <https://libcamera.org>`_ with a
+dedicated pipeline handler and algorithms that perform run-time image correction
+and enhancement.
diff --git a/Documentation/admin-guide/media/mgb4.rst b/Documentation/admin-guide/media/mgb4.rst
index f69d331e3cb1..5ac69b833a7a 100644
--- a/Documentation/admin-guide/media/mgb4.rst
+++ b/Documentation/admin-guide/media/mgb4.rst
@@ -1,8 +1,17 @@
.. SPDX-License-Identifier: GPL-2.0
+.. include:: <isonum.txt>
+
The mgb4 driver
===============
+Copyright |copy| 2023 - 2025 Digiteq Automotive
+ author: Martin Tůma <martin.tuma@digiteqautomotive.com>
+
+This is a v4l2 device driver for the Digiteq Automotive FrameGrabber 4, a PCIe
+card capable of capturing and generating FPD-Link III and GMSL2/3 video streams
+as used in the automotive industry.
+
sysfs interface
---------------
diff --git a/Documentation/admin-guide/media/pci-cardlist.rst b/Documentation/admin-guide/media/pci-cardlist.rst
index 7d8e3c8987db..239879634ea5 100644
--- a/Documentation/admin-guide/media/pci-cardlist.rst
+++ b/Documentation/admin-guide/media/pci-cardlist.rst
@@ -86,7 +86,6 @@ saa7134 Philips SAA7134
saa7164 NXP SAA7164
smipcie SMI PCIe DVBSky cards
solo6x10 Bluecherry / Softlogic 6x10 capture cards (MPEG-4/H.264)
-sta2x11_vip STA2X11 VIP Video For Linux
tw5864 Techwell TW5864 video/audio grabber and encoder
tw686x Intersil/Techwell TW686x
tw68 Techwell tw68x Video For Linux
diff --git a/Documentation/admin-guide/media/v4l-drivers.rst b/Documentation/admin-guide/media/v4l-drivers.rst
index e8761561b2fe..3bac5165b134 100644
--- a/Documentation/admin-guide/media/v4l-drivers.rst
+++ b/Documentation/admin-guide/media/v4l-drivers.rst
@@ -10,6 +10,7 @@ Video4Linux (V4L) driver-specific documentation
:maxdepth: 2
bttv
+ c3-isp
cafe_ccic
cx88
fimc
diff --git a/Documentation/admin-guide/namespaces/resource-control.rst b/Documentation/admin-guide/namespaces/resource-control.rst
index 369556e00f0c..553a44803231 100644
--- a/Documentation/admin-guide/namespaces/resource-control.rst
+++ b/Documentation/admin-guide/namespaces/resource-control.rst
@@ -1,17 +1,17 @@
-===========================
-Namespaces research control
-===========================
+====================================
+User namespaces and resource control
+====================================
-There are a lot of kinds of objects in the kernel that don't have
-individual limits or that have limits that are ineffective when a set
-of processes is allowed to switch user ids. With user namespaces
-enabled in a kernel for people who don't trust their users or their
-users programs to play nice this problems becomes more acute.
+The kernel contains many kinds of objects that either don't have
+individual limits or that have limits which are ineffective when
+a set of processes is allowed to switch their UID. On a system
+where the admins don't trust their users or their users' programs,
+user namespaces expose the system to potential misuse of resources.
-Therefore it is recommended that memory control groups be enabled in
-kernels that enable user namespaces, and it is further recommended
-that userspace configure memory control groups to limit how much
-memory user's they don't trust to play nice can use.
+In order to mitigate this, we recommend that admins enable memory
+control groups on any system that enables user namespaces.
+Furthermore, we recommend that admins configure the memory control
+groups to limit the maximum memory usable by any untrusted user.
Memory control groups can be configured by installing the libcgroup
package present on most distros editing /etc/cgrules.conf,
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
index 3950583f2b15..2d74af7f0efe 100644
--- a/Documentation/admin-guide/pm/cpufreq.rst
+++ b/Documentation/admin-guide/pm/cpufreq.rst
@@ -231,7 +231,7 @@ are the following:
present).
The existence of the limit may be a result of some (often unintentional)
- BIOS settings, restrictions coming from a service processor or another
+ BIOS settings, restrictions coming from a service processor or other
BIOS/HW-based mechanisms.
This does not cover ACPI thermal limitations which can be discovered
@@ -258,8 +258,8 @@ are the following:
extension on ARM). If one cannot be determined, this attribute should
not be present.
- Note, that failed attempt to retrieve current frequency for a given
- CPU(s) will result in an appropriate error, i.e: EAGAIN for CPU that
+ Note that failed attempt to retrieve current frequency for a given
+ CPU(s) will result in an appropriate error, i.e.: EAGAIN for CPU that
remains idle (raised on ARM).
``cpuinfo_max_freq``
@@ -499,7 +499,7 @@ This governor exposes the following tunables:
represented by it to be 1.5 times as high as the transition latency
(the default)::
- # echo `$(($(cat cpuinfo_transition_latency) * 3 / 2)) > ondemand/sampling_rate
+ # echo `$(($(cat cpuinfo_transition_latency) * 3 / 2))` > ondemand/sampling_rate
``up_threshold``
If the estimated CPU load is above this value (in percent), the governor
diff --git a/Documentation/admin-guide/pm/intel_idle.rst b/Documentation/admin-guide/pm/intel_idle.rst
index 5940528146eb..ed6f055d4b14 100644
--- a/Documentation/admin-guide/pm/intel_idle.rst
+++ b/Documentation/admin-guide/pm/intel_idle.rst
@@ -38,6 +38,27 @@ instruction at all.
only way to pass early-configuration-time parameters to it is via the kernel
command line.
+Sysfs Interface
+===============
+
+The ``intel_idle`` driver exposes the following ``sysfs`` attributes in
+``/sys/devices/system/cpu/cpuidle/``:
+
+``intel_c1_demotion``
+ Enable or disable C1 demotion for all CPUs in the system. This file is
+ only exposed on platforms that support the C1 demotion feature and where
+ it was tested. Value 0 means that C1 demotion is disabled, value 1 means
+ that it is enabled. Write 0 or 1 to disable or enable C1 demotion for
+ all CPUs.
+
+ The C1 demotion feature involves the platform firmware demoting deep
+ C-state requests from the OS (e.g., C6 requests) to C1. The idea is that
+ firmware monitors CPU wake-up rate, and if it is higher than a
+ platform-specific threshold, the firmware demotes deep C-state requests
+ to C1. For example, Linux requests C6, but firmware noticed too many
+ wake-ups per second, and it keeps the CPU in C1. When the CPU stays in
+ C1 long enough, the platform promotes it back to C6. This may improve
+ some workloads' performance, but it may also increase power consumption.
.. _intel-idle-enumeration-of-states:
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index 78fc83ed2a7e..26e702c7016e 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -329,6 +329,106 @@ information listed above is the same for all of the processors supporting the
HWP feature, which is why ``intel_pstate`` works with all of them.]
+Support for Hybrid Processors
+=============================
+
+Some processors supported by ``intel_pstate`` contain two or more types of CPU
+cores differing by the maximum turbo P-state, performance vs power characteristics,
+cache sizes, and possibly other properties. They are commonly referred to as
+hybrid processors. To support them, ``intel_pstate`` requires HWP to be enabled
+and it assumes the HWP performance units to be the same for all CPUs in the
+system, so a given HWP performance level always represents approximately the
+same physical performance regardless of the core (CPU) type.
+
+Hybrid Processors with SMT
+--------------------------
+
+On systems where SMT (Simultaneous Multithreading), also referred to as
+HyperThreading (HT) in the context of Intel processors, is enabled on at least
+one core, ``intel_pstate`` assigns performance-based priorities to CPUs. Namely,
+the priority of a given CPU reflects its highest HWP performance level which
+causes the CPU scheduler to generally prefer more performant CPUs, so the less
+performant CPUs are used when the other ones are fully loaded. However, SMT
+siblings (that is, logical CPUs sharing one physical core) are treated in a
+special way such that if one of them is in use, the effective priority of the
+other ones is lowered below the priorities of the CPUs located in the other
+physical cores.
+
+This approach maximizes performance in the majority of cases, but unfortunately
+it also leads to excessive energy usage in some important scenarios, like video
+playback, which is not generally desirable. While there is no other viable
+choice with SMT enabled because the effective capacity and utilization of SMT
+siblings are hard to determine, hybrid processors without SMT can be handled in
+more energy-efficient ways.
+
+.. _CAS:
+
+Capacity-Aware Scheduling Support
+---------------------------------
+
+The capacity-aware scheduling (CAS) support in the CPU scheduler is enabled by
+``intel_pstate`` by default on hybrid processors without SMT. CAS generally
+causes the scheduler to put tasks on a CPU so long as there is a sufficient
+amount of spare capacity on it, and if the utilization of a given task is too
+high for it, the task will need to go somewhere else.
+
+Since CAS takes CPU capacities into account, it does not require CPU
+prioritization and it allows tasks to be distributed more symmetrically among
+the more performant and less performant CPUs. Once placed on a CPU with enough
+capacity to accommodate it, a task may just continue to run there regardless of
+whether or not the other CPUs are fully loaded, so on average CAS reduces the
+utilization of the more performant CPUs which causes the energy usage to be more
+balanced because the more performant CPUs are generally less energy-efficient
+than the less performant ones.
+
+In order to use CAS, the scheduler needs to know the capacity of each CPU in
+the system and it needs to be able to compute scale-invariant utilization of
+CPUs, so ``intel_pstate`` provides it with the requisite information.
+
+First of all, the capacity of each CPU is represented by the ratio of its highest
+HWP performance level, multiplied by 1024, to the highest HWP performance level
+of the most performant CPU in the system, which works because the HWP performance
+units are the same for all CPUs. Second, the frequency-invariance computations,
+carried out by the scheduler to always express CPU utilization in the same units
+regardless of the frequency it is currently running at, are adjusted to take the
+CPU capacity into account. All of this happens when ``intel_pstate`` has
+registered itself with the ``CPUFreq`` core and it has figured out that it is
+running on a hybrid processor without SMT.
+
+Energy-Aware Scheduling Support
+-------------------------------
+
+If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
+``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
+`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
+Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
+``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
+to operate in the `passive mode <Passive Mode_>`_.
+
+The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
+based on abstract cost values and it does not include any real power numbers)
+and it is relatively simple to avoid unnecessary computations in the scheduler.
+There is a performance domain in it for every CPU in the system and the cost
+values for these performance domains have been chosen so that running a task on
+a less performant (small) CPU appears to be always cheaper than running that
+task on a more performant (big) CPU. However, for two CPUs of the same type,
+the cost difference depends on their current utilization, and the CPU whose
+current utilization is higher generally appears to be a more expensive
+destination for a given task. This helps to balance the load among CPUs of the
+same type.
+
+Since EAS works on top of CAS, high-utilization tasks are always migrated to
+CPUs with enough capacity to accommodate them, but thanks to EAS, low-utilization
+tasks tend to be placed on the CPUs that look less expensive to the scheduler.
+Effectively, this causes the less performant and less loaded CPUs to be
+preferred as long as they have enough spare capacity to run the given task
+which generally leads to reduced energy usage.
+
+The Energy Model created by ``intel_pstate`` can be inspected by looking at
+the ``energy_model`` directory in ``debugfs`` (typlically mounted on
+``/sys/kernel/debug/``).
+
+
User Space Interface in ``sysfs``
=================================
@@ -697,8 +797,8 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
Limits`_ for details).
``no_cas``
- Do not enable capacity-aware scheduling (CAS) which is enabled by
- default on hybrid systems.
+ Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
+ default on hybrid systems without SMT.
Diagnostics and Tuning
======================
diff --git a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
index 5151ec312dc0..d367ba4d744a 100644
--- a/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
+++ b/Documentation/admin-guide/pm/intel_uncore_frequency_scaling.rst
@@ -91,12 +91,22 @@ Attributes in each directory:
``domain_id``
This attribute is used to get the power domain id of this instance.
+``die_id``
+ This attribute is used to get the Linux die id of this instance.
+ This attribute is only present for domains with core agents and
+ when the CPUID leaf 0x1f presents die ID.
+
``fabric_cluster_id``
This attribute is used to get the fabric cluster id of this instance.
``package_id``
This attribute is used to get the package id of this instance.
+``agent_types``
+ This attribute displays all the hardware agents present within the
+ domain. Each agent has the capability to control one or more hardware
+ subsystems, which include: core, cache, memory, and I/O.
+
The other attributes are same as presented at package_*_die_* level.
In most of current use cases, the "max_freq_khz" and "min_freq_khz"
diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
index 07cfd8863b46..4a5ffb0996a3 100644
--- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst
+++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
@@ -347,7 +347,7 @@ again.
[:ref:`details<uninstall>`]
-.. _submit_improvements:
+.. _submit_improvements_qbtl:
Did you run into trouble following any of the above steps that is not cleared up
by the reference section below? Or do you have ideas how to improve the text?
@@ -1070,7 +1070,7 @@ complicated, and harder to follow.
That being said: this of course is a balancing act. Hence, if you think an
additional use-case is worth describing, suggest it to the maintainers of this
-document, as :ref:`described above <submit_improvements>`.
+document, as :ref:`described above <submit_improvements_qbtl>`.
..
diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 2fd5a030235a..9a847506f6ec 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -41,7 +41,7 @@ If you are facing multiple issues with the Linux kernel at once, report each
separately. While writing your report, include all information relevant to the
issue, like the kernel and the distro used. In case of a regression, CC the
regressions mailing list (regressions@lists.linux.dev) to your report. Also try
-to pin-point the culprit with a bisection; if you succeed, include its
+to pinpoint the culprit with a bisection; if you succeed, include its
commit-id and CC everyone in the sign-off-by chain.
Once the report is out, answer any questions that come up and help where you
@@ -206,7 +206,7 @@ Reporting issues only occurring in older kernel version lines
This subsection is for you, if you tried the latest mainline kernel as outlined
above, but failed to reproduce your issue there; at the same time you want to
see the issue fixed in a still supported stable or longterm series or vendor
-kernels regularly rebased on those. If that the case, follow these steps:
+kernels regularly rebased on those. If that is the case, follow these steps:
* Prepare yourself for the possibility that going through the next few steps
might not get the issue solved in older releases: the fix might be too big
@@ -312,7 +312,7 @@ small modifications to a kernel based on a recent Linux version; that for
example often holds true for the mainline kernels shipped by Debian GNU/Linux
Sid or Fedora Rawhide. Some developers will also accept reports about issues
with kernels from distributions shipping the latest stable kernel, as long as
-its only slightly modified; that for example is often the case for Arch Linux,
+it's only slightly modified; that for example is often the case for Arch Linux,
regular Fedora releases, and openSUSE Tumbleweed. But keep in mind, you better
want to use a mainline Linux and avoid using a stable kernel for this
process, as outlined in the section 'Install a fresh kernel for testing' in more
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 8290177b4f75..d385985b305f 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -75,6 +75,7 @@ Currently, these files are in /proc/sys/vm:
- unprivileged_userfaultfd
- user_reserve_kbytes
- vfs_cache_pressure
+- vfs_cache_pressure_denom
- watermark_boost_factor
- watermark_scale_factor
- zone_reclaim_mode
@@ -1017,19 +1018,28 @@ vfs_cache_pressure
This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.
-At the default value of vfs_cache_pressure=100 the kernel will attempt to
-reclaim dentries and inodes at a "fair" rate with respect to pagecache and
-swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
-to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
-never reclaim dentries and inodes due to memory pressure and this can easily
-lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
-causes the kernel to prefer to reclaim dentries and inodes.
+At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel
+will attempt to reclaim dentries and inodes at a "fair" rate with respect to
+pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the
+kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0,
+the kernel will never reclaim dentries and inodes due to memory pressure and
+this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure
+beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries
+and inodes.
-Increasing vfs_cache_pressure significantly beyond 100 may have negative
-performance impact. Reclaim code needs to take various locks to find freeable
-directory and inode objects. With vfs_cache_pressure=1000, it will look for
-ten times more freeable objects than there are.
+Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may
+have negative performance impact. Reclaim code needs to take various locks to
+find freeable directory and inode objects. When vfs_cache_pressure equals
+(10 * vfs_cache_pressure_denom), it will look for ten times more freeable
+objects than there are.
+Note: This setting should always be used together with vfs_cache_pressure_denom.
+
+vfs_cache_pressure_denom
+========================
+
+Defaults to 100 (minimum allowed value). Requires corresponding
+vfs_cache_pressure setting to take effect.
watermark_boost_factor
======================
diff --git a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
index 03c55151346c..d8946b084b1e 100644
--- a/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
+++ b/Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
@@ -267,7 +267,7 @@ culprit might be known already. For further details on what actually qualifies
as a regression check out Documentation/admin-guide/reporting-regressions.rst.
If you run into any problems while following this guide or have ideas how to
-improve it, :ref:`please let the kernel developers know <submit_improvements>`.
+improve it, :ref:`please let the kernel developers know <submit_improvements_vbbr>`.
.. _introprep_bissbs:
@@ -1055,7 +1055,7 @@ follow these instructions.
[:ref:`details <introoptional_bisref>`]
-.. _submit_improvements:
+.. _submit_improvements_vbbr:
Conclusion
----------
diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst
index 5becb441c3cb..a18328a5fb93 100644
--- a/Documentation/admin-guide/xfs.rst
+++ b/Documentation/admin-guide/xfs.rst
@@ -151,6 +151,17 @@ When mounting an XFS filesystem, the following options are accepted.
optional, and the log section can be separate from the data
section or contained within it.
+ max_atomic_write=value
+ Set the maximum size of an atomic write. The size may be
+ specified in bytes, in kilobytes with a "k" suffix, in megabytes
+ with a "m" suffix, or in gigabytes with a "g" suffix. The size
+ cannot be larger than the maximum write size, larger than the
+ size of any allocation group, or larger than the size of a
+ remapping operation that the log can complete atomically.
+
+ The default value is to set the maximum I/O completion size
+ to allow each CPU to handle one at a time.
+
max_open_zones=value
Specify the max number of zones to keep open for writing on a
zoned rt device. Many open zones aids file data separation