Age | Commit message (Collapse) | Author |
|
Add documentation on how the CXL driver interacts with the DAX driver.
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-12-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add 4 example configurations:
- single device
- cross-host-bridge interleave
- intra-host-bridge-interleave
- multi-level interleave
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-11-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add docs for the CXL driver that explains the base devices,
decoder types, region types, mailbox interfaces, and decoder
programming.
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-10-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Document __init time configurations that affect CXL driver probe
process and memory region configuration.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-9-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add type-3 device configuration overview that explains the probe
process for a type-3 device from early-boot through memory-hotplug.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-8-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add example ACPI Table configurations for different sample platforms.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-7-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add basic ACPI table information needed to understand the CXL
driver probe process.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-6-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add some docs on CXL configurations done in bios/efi that affect
linux configuration - information vendors may care to consider.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-5-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Add a simple device primer sufficient to understand the theory
of operation documentation.
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-4-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Place the hierarchy diagram in access-coordinates.rst in a code block.
Fix a few grammar issues.
Suggested-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-3-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Restructure the cxl folder to make adding docs per-page cleaner.
Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-2-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Everywhere else in this doc, the dispatch queue acronym (DSQ) is
uppercase. There were a couple places where the acronym was written in
lowercase. I changed them to uppercase to make it homogeneous.
Signed-off-by: Jake Rice <jake@jakerice.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Fix the interconnects in the example to follow the schema changes.
Fixes: 60b8d3a2365a ("dt-bindings: display: msm: sm8350-mdss: Describe the CPU-CFG icc path")
Reported-by: Rob Herring <robh@kernel.org>
Closes: http://lore.kernel.org/r/CAL_JsqKr8Xd8uxFzE0YJTyD+V6N++VV8SX-GB5Xt0_BKkeoGUQ@mail.gmail.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/651775/
Link: https://lore.kernel.org/r/20250505-sm8350-fix-example-v1-1-36d5d9ccba66@oss.qualcomm.com
|
|
Fix indentation for a bullet list item in bpf_iterators.rst.
According to reStructuredText rules, bullet list item bodies must be
consistently indented relative to the bullet. The indentation of the
first line after the bullet determines the alignment for the rest of
the item body.
Reported by smatch:
/linux/Documentation/bpf/bpf_iterators.rst:55: WARNING: Bullet list ends without a blank line; unexpected unindent. [docutils]
Fixes: 7220eabff8cb ("bpf, docs: document open-coded BPF iterators")
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250513015901.475207-1-khaledelnaggarlinux@gmail.com
|
|
FAULT_KMALLOC 0x000000001
There is one redundant '0' in 32-bits hexademical number of fault type,
remove it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Commit 4b4ab93ddc5f ("dt-bindings: remoteproc: Consolidate SC8180X and
SM8150 PAS files") moved SC8180X bindings from separate file into this
one, but it forgot to add actual compatibles in top-level properties
section making the entire binding un-selectable (no-op) for SC8180X PAS.
Fixes: 4b4ab93ddc5f ("dt-bindings: remoteproc: Consolidate SC8180X and SM8150 PAS files")
Cc: stable@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20250428075243.44256-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
From the software POV, it matches the SM8350's implementation.
Describe it as such, with a fallback.
Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> # Lenovo X13s
Link: https://lore.kernel.org/r/20250503-topic-8280_slpi-v1-1-9400a35574f7@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Convert the Marvell Tauros2 Cache binding to DT schema.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
|
|
Convert the Marvell Feroceon/Kirkwood Cache binding to DT schema format.
Use "marvell,kirkwood-cache" for the filename instead as that's only
compatible used in a .dts upstream.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
|
|
Documentation/userspace-api/media/glossary.rst:170: WARNING: duplicate term description of SPI, other instance in gpu/amdgpu/amdgpu-glossary
That's because SPI of amdgpu (Shader Processor Input) shares the same
global glossary term as SPI of media subsystem (which is Serial
Peripheral Interface Bus). Disambiguate the former from the latter to
fix the warning.
Note that adding context qualifiers in the term is strictly necessary
in order to make Sphinx happy.
Fixes: dd3d035a7838 ("Documentation/gpu: Add new entries to amdgpu glossary")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20250509185845.60bf5e7b@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into gpio/for-next
Immutable branch between MFD, GPIO and NVMEM due for the v6.16 merge window
|
|
Add the register!() macro, which defines a given register's layout and
provide bit-field accessors with a way to convert them to a given type.
This macro will allow us to make clear definitions of the registers and
manipulate their fields safely.
The long-term goal is to eventually move it to the kernel crate so it
can be used by other drivers as well, but it was agreed to first land it
into nova-core and make it mature there.
To illustrate its usage, use it to define the layout for the Boot0
(renamed to NV_PMC_BOOT_0 to match OpenRM's naming scheme) and take
advantage of its accessors.
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Link: https://lore.kernel.org/r/20250507-nova-frts-v3-5-fcb02749754d@nvidia.com
[ Fix typo in commit message. - Danilo ]
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Correct the gpio-ranges in the QCS8300 TLMM pin controller example to
include the UFS_RESET pin, which is expected to be wired to the reset
pin of the primary UFS memory. This allows the UFS driver to toggle it.
Fixes: 5778535972e2 ("dt-bindings: pinctrl: describe qcs8300-tlmm")
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Lijuan Gao <quic_lijuang@quicinc.com>
Link: https://lore.kernel.org/20250506-correct_gpio_ranges-v3-2-49a7d292befa@quicinc.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Correct the gpio-ranges in the QCS615 TLMM pin controller example to
include the UFS_RESET pin, which is expected to be wired to the reset
pin of the primary UFS memory. This allows the UFS driver to toggle it.
Fixes: 55c487ea6084 ("dt-bindings: pinctrl: document the QCS615 Top Level Mode Multiplexer")
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Lijuan Gao <quic_lijuang@quicinc.com>
Link: https://lore.kernel.org/20250506-correct_gpio_ranges-v3-1-49a7d292befa@quicinc.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Describe the support for hybrid processors in intel_pstate, including
the CAS and EAS support, in the admin-guide documentation.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/1935040.CQOukoFCf9@rjwysocki.net
|
|
Fix the API name to free the allocated table in the example driver code
that modifies the EM. Also fix the passing of correct table when
updating the cost.
Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com>
Link: https://patch.msgid.link/20250511071141.13237-1-atulpant.linux@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add a new stm32mp25 compatible to stm32-lptimer dt-bindings, to support
STM32MP25 SoC. Some features has been updated or added to the low-power
timer:
- new capture compare channels
- up to two PWM channels
- PWM input capture
- peripheral interconnect in stm32mp25 has been updated (new triggers).
- registers/bits has been added or revisited (IER access).
So introduce a new compatible to handle this diversity.
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@foss.st.com>
Reviewed-by: "Rob Herring (Arm)" <robh@kernel.org>
Link: https://lore.kernel.org/r/20250429125133.1574167-2-fabrice.gasnier@foss.st.com
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
The Liontron H-A133L is an industrial development board using the
Allwinner A133 SoC.
Add its compatible name to the list of valid board names.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250505164729.18175-3-andre.przywara@arm.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
|
|
Liontron is a company based in Shenzen, China, making industrial
development boards and embedded computers, mostly using Rockchip and
Allwinner SoCs.
Add their name to the list of vendors.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20250505164729.18175-2-andre.przywara@arm.com
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
|
|
Drivers need to make sure not to pass netmem dma-addrs to the
dma-mapping API in order to support netmem TX.
Add helpers and netmem_dma_*() helpers that enables special handling of
netmem dma-addrs that drivers can use.
Document in netmem.rst what drivers need to do to support netmem TX.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-7-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add documentation outlining the usage and details of the devmem TCP TX
API.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250508004830.4100853-6-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add bind-tx netlink call to attach dmabuf for TX; queue is not
required, only ifindex and dmabuf fd for attachment.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20250508004830.4100853-4-almasrymina@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Conflicts:
Documentation/admin-guide/hw-vuln/index.rst
arch/x86/include/asm/cpufeatures.h
arch/x86/kernel/alternative.c
arch/x86/kernel/cpu/bugs.c
arch/x86/kernel/cpu/common.c
drivers/base/cpu.c
include/linux/cpu.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:
6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Prepare to resolve conflicts with an upstream series of fixes that conflict
with pending x86 changes:
6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
gs101 requires access to the pmu interrupt generation register region
which is exposed as a syscon. Update the exynos-pmu bindings documentation
to reflect this.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20250506-contrib-pg-cpu-hotplug-suspend2ram-fixes-v1-v4-2-9f64a2657316@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
|
|
Add bindings documentation for the Power Management Unit (PMU) interrupt
generator.
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20250506-contrib-pg-cpu-hotplug-suspend2ram-fixes-v1-v4-1-9f64a2657316@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
|
|
Fixes a typo in the root= parameter description, changing
"this a a" to "this is a".
Fixes: c0c1a7dcb6f5 ("init: move the nfs/cifs/ram special cases out of name_to_dev_t")
Signed-off-by: Petr Vaněk <arkamar@atlas.cz>
Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The word accuracy was misspelled as "accruracy".
Signed-off-by: Thushara.M.S <thusharms@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We introduced KHO into Linux: A framework that allows Linux to pass
metadata and memory across kexec from Linux to Linux. KHO reuses fdt as
file format and shares a lot of the same properties of firmware-to- Linux
boot formats: It needs a stable, documented ABI that allows for forward
and backward compatibility as well as versioning.
As first user of KHO, we introduced memblock which can now preserve memory
ranges reserved with reserve_mem command line options contents across
kexec, so you can use the post-kexec kernel to read traces from the
pre-kexec kernel.
This patch adds memblock schemas similar to "device" device tree ones to a
new kho bindings directory. This allows us to force contributors to
document the data that moves across KHO kexecs and catch breaking change
during review.
Link: https://lkml.kernel.org/r/20250509074635.3187114-18-changyuanl@google.com
Co-developed-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
With KHO in place, let's add documentation that describes what it is and
how to use it.
Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Co-developed-by: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ashish Kalra <ashish.kalra@amd.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Cc: Marc Rutland <mark.rutland@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Rob Herring <robh@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Lendacky <thomas.lendacky@amd.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The MGLRU already supports reclaiming only from anonymous memory via the
/sys/kernel/debug/lru_gen interface. Now, memory.reclaim also supports
the swappiness=max parameter to enable reclaiming solely from anonymous
memory. To unify the semantics of proactive reclaiming from anonymous
folios, the max parameter is introduced.
[hezhongkun.hzk@bytedance.com: use strcmp instead of strncmp, if swappiness is not set, use the default value]
Link: https://lkml.kernel.org/r/20250507071057.3184240-1-hezhongkun.hzk@bytedance.com
[akpm@linux-foundation.org: tweak coding style]
Link: https://lkml.kernel.org/r/65181f7745d657d664d833c26d8a94cae40538b9.1745225696.git.hezhongkun.hzk@bytedance.com
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "add max arg to swappiness in memory.reclaim and lru_gen", v4.
This patchset adds max arg to swappiness in memory.reclaim and lru_gen for
anon only proactive memory reclaim.
With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim")
we can submit an additional swappiness=<val> argument to memory.reclaim.
It is very useful because we can dynamically adjust the reclamation ratio
based on the anonymous folios and file folios of each cgroup. For
example,when swappiness is set to 0, we only reclaim from file folios.
But we can not relciam memory just from anon folios.
This patchset introduces a new macro, SWAPPINESS_ANON_ONLY, defined as
MAX_SWAPPINESS + 1, represent the max arg semantics. It specifically
indicates that reclamation should occur only from anonymous pages.
Patch 1 adds swappiness=max arg to memory.reclaim suggested-by: Yosry
Ahmed
Patch 2 add more comments for cache_trim_mode from Johannes Weiner in [1].
Patch 3 add max arg to lru_gen for proactive memory reclaim in MGLRU. The
MGLRU already supports reclaiming exclusively from anonymous pages. This
patch formalizes that behavior by introducing a max parameter to represent
the corresponding semantics.
Patch 4 using SWAPPINESS_ANON_ONLY in MGLRU Using SWAPPINESS_ANON_ONLY
instead of MAX_SWAPPINESS + 1 to indicate reclaiming only from anonymous
pages makes the code more readable and explicit
Here is the previous discussion:
https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/
https://lore.kernel.org/all/20250318135330.3358345-1-hezhongkun.hzk@bytedance.com/
This patch (of 4):
With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim")
we can submit an additional swappiness=<val> argument to memory.reclaim.
It is very useful because we can dynamically adjust the reclamation ratio
based on the anonymous folios and file folios of each cgroup. For
example,when swappiness is set to 0, we only reclaim from file folios.
However,we have also encountered a new issue: when swappiness is set to
the MAX_SWAPPINESS, it may still only reclaim file folios.
So, we hope to add a new arg 'swappiness=max' in memory.reclaim where
proactive memory reclaim only reclaims from anonymous folios when
swappiness is set to max. The swappiness semantics from a user
perspective remain unchanged.
For example, something like this:
echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim
will perform reclaim on the rootcg with a swappiness setting of 'max' (a
new mode) regardless of the file folios. Users have a more comprehensive
view of the application's memory distribution because there are many
metrics available. For example, if we find that a certain cgroup has a
large number of inactive anon folios, we can reclaim only those and skip
file folios, because with the zram/zswap, the IO tradeoff that
cache_trim_mode or other file first logic is making doesn't hold - file
refaults will cause IO, whereas anon decompression will not.
With this patch, the swappiness argument of memory.reclaim has a new
mode 'max', means reclaiming just from anonymous folios both in traditional
LRU and MGLRU.
Link: https://lkml.kernel.org/r/cover.1745225696.git.hezhongkun.hzk@bytedance.com
Link: https://lore.kernel.org/all/20250314141833.GA1316033@cmpxchg.org/ [1]
Link: https://lkml.kernel.org/r/519e12b9b1f8c31a01e228c8b4b91a2419684f77.1745225696.git.hezhongkun.hzk@bytedance.com
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
add more explanation in doc and commit message on O_NONBLOCK side-effects
(Johannes)
Link: https://lkml.kernel.org/r/20250506232833.3109790-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Setting the max and high limits can trigger synchronous reclaim and/or
oom-kill if the usage is higher than the given limit. This behavior is
fine for newly created cgroups but it can cause issues for the node
controller while setting limits for existing cgroups.
In our production multi-tenant and overcommitted environment, we are
seeing priority inversion when the node controller dynamically adjusts the
limits of running jobs of different priorities. Based on the system
situation, the node controller may reduce the limits of lower priority
jobs and increase the limits of higher priority jobs. However we are
seeing node controller getting stuck for long period of time while
reclaiming from lower priority jobs while setting their limits and also
spends a lot of its own CPU.
One of the workaround we are trying is to fork a new process which sets
the limit of the lower priority job along with setting an alarm to get
itself killed if it get stuck in the reclaim for lower priority job.
However we are finding it very unreliable and costly. Either we need a
good enough time buffer for the alarm to be delivered after setting limit
and potentialy spend a lot of CPU in the reclaim or be unreliable in
setting the limit for much shorter but cheaper (less reclaim) alarms.
Let's introduce new limit setting option which does not trigger reclaim
and/or oom-kill and let the processes in the target cgroup to trigger
reclaim and/or throttling and/or oom-kill in their next charge request.
This will make the node controller on multi-tenant overcommitted
environment much more reliable.
Explanation from Johannes on side-effects of O_NONBLOCK limit change:
It's usually the allocating tasks inside the group bearing the cost of
limit enforcement and reclaim. This allows a (privileged) updater from
outside the group to keep that cost in there - instead of having to
help, from a context that doesn't necessarily make sense.
I suppose the tradeoff with that - and the reason why this was doing
sync reclaim in the first place - is that, if the group is idle and
not trying to allocate more, it can take indefinitely for the new
limit to actually be met.
It should be okay in most scenarios in practice. As the capacity is
reallocated from group A to B, B will exert pressure on A once it
tries to claim it and thereby shrink it down. If A is idle, that
shouldn't be hard. If A is running, it's likely to fault/allocate
soon-ish and then join the effort.
It does leave a (malicious) corner case where A is just busy-hitting
its memory to interfere with the clawback. This is comparable to
reclaiming memory.low overage from the outside, though, which is an
acceptable risk. Users of O_NONBLOCK just need to be aware.
Link: https://lkml.kernel.org/r/20250419183545.1982187-1-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is possible for a reclaimer to cause demotions of an lruvec belonging
to a cgroup with cpuset.mems set to exclude some nodes. Attempt to apply
this limitation based on the lruvec's memcg and prevent demotion.
Notably, this may still allow demotion of shared libraries or any memory
first instantiated in another cgroup. This means cpusets still cannot
cannot guarantee complete isolation when demotion is enabled, and the docs
have been updated to reflect this.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently - with the noted exceptions.
Note on locking:
The cgroup_get_e_css reference protects the css->effective_mems, and calls
of this interface would be subject to the same race conditions associated
with a non-atomic access to cs->effective_mems.
So while this interface cannot make strong guarantees of correctness, it
can therefore avoid taking a global or rcu_read_lock for performance.
Link: https://lkml.kernel.org/r/20250424202806.52632-3-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Suggested-by: Shakeel Butt <shakeel.butt@linux.dev>
Suggested-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Waiman Long <longman@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use cl@gentwo.org throughout and remove the old email addresses.
Link: https://lkml.kernel.org/r/8b962f57-4d98-cbb0-cd82-b6ba456733e8@gentwo.org
Signed-off-by: Christoph Lameter <cl@gentwo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a description of 'nid' file, which is optionally used for specific
DAMOS quota goal metrics such as node_mem_{used,free}_bp on the DAMON
sysfs ABI document.
Link: https://lkml.kernel.org/r/20250420194030.75838-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add description of 'nid' file, which is optionally used for specific DAMOS
quota goal metrics such as node_mem_{used,free}_bp on DAMON usage
document.
Link: https://lkml.kernel.org/r/20250420194030.75838-6-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add description of DAMOS quota goal metrics for NUMA node utilization on
the DAMON deesign document.
Link: https://lkml.kernel.org/r/20250420194030.75838-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Yunjeong Mun <yunjeong.mun@sk.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|