summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-05-12selftests/bpf: Allow skipping docs compilationMykyta Yatsenko
Currently rst2man is required to build bpf selftests, as the tool is used by Makefile.docs. rst2man may be missing in some build environments and is not essential for selftests. It makes sense to allow user to skip building docs. This patch adds SKIP_DOCS variable into bpf selftests Makefile that when set to 1 allows skipping building docs, for example: make -C tools/testing/selftests TARGETS=bpf SKIP_DOCS=1 Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250510002450.365613-1-mykyta.yatsenko5@gmail.com
2025-05-12perf tests: Harden branch stack sampling testIan Rogers
On continuous testing the perf script output can be empty, or nearly empty, causing tr/grep to exit and due to "set -e" the test traps and fails. Add some empty file handling that sets the test to skip and make grep and other text rewriting failures non-fatal by adding "|| true". Committer testing: root@number:~# grep -m1 "model name" /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf test "Check branch stack sampling" 104: Check branch stack sampling : Ok root@number:~# root@number:~# perf test -vvvvvvv "Check branch stack sampling" 104: Check branch stack sampling: --- start --- test child forked, pid 396047 142d22-142da0 l brstack_bench perf does have symbol 'brstack_bench' Testing user branch stack sampling Testing branch stack filtering permutation (any_call,CALL|IND_CALL|COND_CALL|SYSCALL|IRQ) Testing branch stack filtering permutation (call,CALL|SYSCALL) Testing branch stack filtering permutation (cond,COND) Testing branch stack filtering permutation (any_ret,RET|COND_RET|SYSRET|ERET) Testing branch stack filtering permutation (call,cond,CALL|SYSCALL|COND) Testing branch stack filtering permutation (any_call,cond,CALL|IND_CALL|COND_CALL|IRQ|SYSCALL|COND) Testing branch stack filtering permutation (cond,any_call,any_ret,COND|CALL|IND_CALL|COND_CALL|SYSCALL|IRQ|RET|COND_RET|SYSRET|ERET) ---- end(0) ---- 104: Check branch stack sampling : Ok root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250318161639.34446-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12selftests/bpf: test_verifier verbose log overflowsGregory Bell
Tests: - 458/p ld_dw: xor semi-random 64-bit imms, test 5 - 501/p scale: scale test 1 - 502/p scale: scale test 2 fail in verbose mode due to bpf_vlog[] overflowing. These tests generate large verifier logs that exceed the current buffer size, causing them to fail to load. Increase the size of the bpf_vlog[] buffer to accommodate larger logs and prevent false failures during test runs with verbose output. Signed-off-by: Gregory Bell <grbell@redhat.com> Link: https://lore.kernel.org/r/e49267100f07f099a5877a3a5fc797b702bbaf0c.1747058195.git.grbell@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-12selftests/bpf: test_verifier verbose causes erroneous failuresGregory Bell
When running test_verifier with the -v flag and a test with `expected_ret==VERBOSE_ACCEPT`, the opts.log_level is unintentionally overwritten because the verbose flag takes precedence. This leads to a mismatch in the expected and actual contents of bpf_vlog, causing tests to fail incorrectly. Reorder the conditional logic that sets opts.log_level to preserve the expected log level and prevent it from being overridden by -v. Signed-off-by: Gregory Bell <grbell@redhat.com> Link: https://lore.kernel.org/r/182bf00474f817c99f968a9edb119882f62be0f8.1747058195.git.grbell@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-12perf parse-events: Add "cpu" term to set the CPU an event is recorded onIan Rogers
The -C option allows the CPUs for a list of events to be specified but its not possible to set the CPU for a single event. Add a term to allow this. The term isn't a general CPU list due to ',' already being a special character in event parsing instead multiple cpu= terms may be provided and they will be merged/unioned together. An example of mixing different types of events counted on different CPUs: ``` $ perf stat -A -C 0,4-5,8 -e "instructions/cpu=0/,l1d-misses/cpu=4,cpu=5/,inst_retired.any/cpu=8/,cycles" -a sleep 0.1 Performance counter stats for 'system wide': CPU0 6,979,225 instructions/cpu=0/ # 0.89 insn per cycle CPU4 75,138 cpu/l1d-misses/ CPU5 1,418,939 cpu/l1d-misses/ CPU8 797,553 cpu/inst_retired.any,cpu=8/ CPU0 7,845,302 cycles CPU4 6,546,859 cycles CPU5 185,915,438 cycles CPU8 2,065,668 cycles 0.112449242 seconds time elapsed ``` Committer testing: root@number:~# grep -m1 "model name" /proc/cpuinfo model name : AMD Ryzen 9 9950X3D 16-Core Processor root@number:~# perf stat -A -e "instructions/cpu=0/,instructions,l1d-misses/cpu=4,cpu=5/,cycles" -a sleep 0.1 Performance counter stats for 'system wide': CPU0 2,398,351 instructions/cpu=0/ # 0.44 insn per cycle CPU0 2,398,152 instructions # 0.44 insn per cycle CPU1 1,265,634 instructions # 0.49 insn per cycle CPU2 606,087 instructions # 0.50 insn per cycle CPU3 4,025,752 instructions # 0.52 insn per cycle CPU4 4,236,810 instructions # 0.53 insn per cycle CPU5 3,984,832 instructions # 0.66 insn per cycle CPU6 434,132 instructions # 0.44 insn per cycle CPU7 65,752 instructions # 0.41 insn per cycle CPU8 459,083 instructions # 0.48 insn per cycle CPU9 6,464,161 instructions # 1.31 insn per cycle <SNIP> root@number:~# perf stat -e "instructions/cpu=0/,instructions,l1d-misses/cpu=4,cpu=5/,cycles" -a sleep 0. Performance counter stats for 'system wide': 144,822 instructions/cpu=0/ # 0.03 insn per cycle 4,666,114 instructions # 0.93 insn per cycle 2,583 l1d-misses 4,993,633 cycles 0.000868512 seconds time elapsed root@number:~# Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250403194337.40202-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf parse-events: Set is_pmu_core for legacy hardware eventsIan Rogers
Also set the CPU map to all online CPU maps. This is done so the behavior of legacy hardware and hardware cache events better matches that of sysfs and JSON events during __perf_evlist__propagate_maps(). Fix missing cpumap put in "Synthesize attr update" test. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250403194337.40202-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf stat: Use counter cpumask to skip zero valuesIan Rogers
When a counter is 0 it may or may not be skipped. For uncore counters it is common they are only valid on 1 logical CPU and all other CPUs should be skipped. The PMU's cpumask was used for the skip calculation, but that cpumask may not reflect user overrides. Similarly a counter on a core PMU may explicitly not request a CPU be gathered. If the counter on this CPU's value is 0 then the counter should be skipped as it wasn't requested. Switch from using the PMU cpumask to that associated with the evsel to support these cases. Avoid potential crash with --per-thread mode where config->aggr_get_id is NULL. Add some examples for the tool event 0 counter skipping. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250403194337.40202-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12libperf cpumap: Add ability to create CPU from a single CPU numberIan Rogers
Add perf_cpu_map__new_int() so that a CPU map can be created from a single integer. Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20250403194337.40202-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf tests metrics: Permission related fixesIan Rogers
When permissions are limited running sleep without system wide isn't a good benchmark to run to achieve samples, switch to running noploop. Remove indent for non-success cases. Allow skip for the not counted case. Minor debug changes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250412004704.2297939-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf evsel: Add per-thread warning for EOPNOTSUPP open failuesIan Rogers
The mrvl_ddr_pmu will return EOPNOTSUPP if opened in per-thread mode. Give a warning for this similar to EINVAL. Doing this better supports metric testing with limited permissions when the mrvl_ddr_pmu is present, as the failure to open causes the test to skip and not fail. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Link: https://lore.kernel.org/r/20250412004704.2297939-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf scripts python: exported-sql-viewer.py: Fix pattern matching with Python 3Adrian Hunter
The script allows the user to enter patterns to find symbols. The pattern matching characters are converted for use in SQL. For PostgreSQL the conversion involves using the Python maketrans() method which is slightly different in Python 3 compared with Python 2. Fix to work in Python 3. Fixes: beda0e725e5f06ac ("perf script python: Add Python3 support to exported-sql-viewer.py") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Tony Jones <tonyj@suse.de> Link: https://lore.kernel.org/r/20250512093932.79854-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf intel-pt: Do not default to recording all switch eventsAdrian Hunter
On systems with many CPUs, recording extra context switch events can be excessive and unnecessary. Add perf config intel-pt.all-switch-events=false to control the behaviour. Example: # perf config intel-pt.all-switch-events=false # perf record -eintel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.082 MB perf.data ] # perf script -D | grep PERF_RECORD_SWITCH | awk '{print $5}' | uniq -c 5 PERF_RECORD_SWITCH # perf config intel-pt.all-switch-events=true # perf record -eintel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.102 MB perf.data ] # perf script -D | grep PERF_RECORD_SWITCH | awk '{print $5}' | uniq -c 180 PERF_RECORD_SWITCH_CPU_WIDE Committer testing: While doing a make -j28 allmodconfig: root@five:~# grep "model name" -m1 /proc/cpuinfo model name : Intel(R) Core(TM) i7-14700K root@five:~# root@five:~# perf config intel-pt.all-switch-events=false root@five:~# perf record -e intel_pt//u uname Linux [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.019 MB perf.data ] root@five:~# perf report --stats | grep SWITCH_CPU_WIDE root@five:~# root@five:~# perf config intel-pt.all-switch-events=true root@five:~# perf record -e intel_pt//u uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.047 MB perf.data ] root@five:~# perf report --stats | grep SWITCH_CPU_WIDE SWITCH_CPU_WIDE events: 542 (96.4%) root@five:~# Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250512093932.79854-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12perf intel-pt: Fix PEBS-via-PT data_srcAdrian Hunter
The Fixes commit did not add support for decoding PEBS-via-PT data_src. Fix by adding support. PEBS-via-PT is a feature of some E-core processors, starting with processors based on Tremont microarchitecture. Because the kernel only supports Intel PT features that are on all processors, there is no support for PEBS-via-PT on hybrids. Currently that leaves processors based on Tremont, Gracemont and Crestmont, however there are no events on Tremont that produce data_src information, and for Gracemont and Crestmont there are only: mem-loads event=0xd0,umask=0x5,ldlat=3 mem-stores event=0xd0,umask=0x6 Affected processors include Alder Lake N (Gracemont), Sierra Forest (Crestmont) and Grand Ridge (Crestmont). Example: # perf record -d -e intel_pt/branch=0/ -e mem-loads/aux-output/pp uname Before: # perf.before script --itrace=o -Fdata_src 0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK N/A 0 |OP No|LVL N/A|SNP N/A|TLB N/A|LCK No|BLK N/A After: # perf script --itrace=o -Fdata_src 10268100142 |OP LOAD|LVL L1 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 10450100442 |OP LOAD|LVL L2 hit|SNP None|TLB L2 miss|LCK No|BLK N/A Fixes: 975846eddf907297 ("perf intel-pt: Add memory information to synthesized PEBS sample") Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250512093932.79854-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-12ACPICA: Update copyright yearSaket Dumbre
ACPICA commit 45253be18b3f37d46cd0072aa3f8a0a21a70e0a4 Changes needed by acpisrc to update copyright year when building for release. Link: https://github.com/acpica/acpica/commit/45253be1 Signed-off-by: Saket Dumbre <saket.dumbre@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-12ACPICA: Apply ACPI_NONSTRING in more placesAhmed Salem
ACPICA commit 1035a3d453f7dd49a235a59ee84ebda9d2d2f41b Add ACPI_NONSTRING for destination char arrays without a terminating NUL character. This is a follow-up to commit 35ad99236f3a ("ACPICA: Apply ACPI_NONSTRING") where not all instances received the same treatment, in preparation for replacing strncpy() calls with memcpy() Link: https://github.com/acpica/acpica/commit/1035a3d4 Signed-off-by: Ahmed Salem <x0rw3ll@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3833065.MHq7AAxBmi@rjwysocki.net
2025-05-12selftests/fs/mount-notify: add a test variant running inside usernsAmir Goldstein
unshare userns in addition to mntns and verify that: 1. watching tmpfs mounted inside userns is allowed with any mark type 2. watching orig root with filesystem mark type is not allowed 3. watching mntns of orig userns is not allowed 4. watching mntns in userns where fanotify_init was called is allowed mount events are only tested with the last case of mntns mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-9-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/filesystems: create setup_userns() helperAmir Goldstein
Add helper to utils.c and use it in statmount userns tests. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-8-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/filesystems: create get_unique_mnt_id() helperAmir Goldstein
Add helper to utils.c and use it in mount-notify and statmount tests. Linking with utils.c drags in a dependecy with libcap, so add it to the Makefile of the tests. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-7-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/fs/mount-notify: build with tools include dirAmir Goldstein
Copy the fanotify uapi header files to the tools include dir and define __kernel_fsid_t to decouple dependency with headers_install and then remove the redundant re-definitions of fanotify macros. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-6-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/mount_settattr: remove duplicate syscall definitionsAmir Goldstein
Which are already defined in wrappers.h. For now, the syscall defintions of mount_settattr() itself remain in the test, which is the only test to use them. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-5-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/pidfd: move syscall definitions into wrappers.hAmir Goldstein
There was already duplicity in some of the defintions. Remove syscall number defintions for __ia64__ that are both stale and incorrect. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-4-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/fs/statmount: build with tools include dirAmir Goldstein
Copy the required headers files (mount.h, nsfs.h) to the tools include dir and define the statmount/listmount syscall numbers to decouple dependency with headers_install for the common cases. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-3-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/filesystems: move wrapper.h out of overlayfs subdirAmir Goldstein
This is not an overlayfs specific header. Reviewed-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/20250509133240.529330-2-amir73il@gmail.com Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/mount_settattr: ensure that ext4 filesystem can be createdChristian Brauner
Filesystem too small for a journal mount: /mnt/D/: mount failed: Operation not permitted. mount_setattr_test.c:1076:idmap_mount_tree_invalid:Expected system("mount -o loop -t ext4 /mnt/C/ext4.img /mnt/D/") (256) == 0 (0) Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/mount_settattr: add missing STATX_MNT_ID_UNIQUE defineChristian Brauner
CC mount_setattr_test In file included from mount_setattr_test.c:24: mount_setattr_test.c: In function ‘mount_setattr_mount_detached_mount_on_detached_mount_and_attach’: mount_setattr_test.c:1850:60: error: ‘STATX_MNT_ID_UNIQUE’ undeclared (first use in this function); did you mean ‘STATX_MNT_ID’? 1850 | ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &stx), 0); | ^~~~~~~~~~~~~~~~~~~ ../kselftest_harness.h:757:20: note: in definition of macro ‘__EXPECT’ 757 | __typeof__(_expected) __exp = (_expected); \ | ^~~~~~~~~ mount_setattr_test.c:1850:9: note: in expansion of macro ‘ASSERT_EQ’ 1850 | ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &stx), 0); | ^~~~~~~~~ mount_setattr_test.c:1850:60: note: each undeclared identifier is reported only once for each function it appears in 1850 | ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &stx), 0); | ^~~~~~~~~~~~~~~~~~~ ../kselftest_harness.h:757:20: note: in definition of macro ‘__EXPECT’ 757 | __typeof__(_expected) __exp = (_expected); \ | ^~~~~~~~~ mount_setattr_test.c:1850:9: note: in expansion of macro ‘ASSERT_EQ’ 1850 | ASSERT_EQ(statx(fd_tree_subdir, "", AT_EMPTY_PATH, STATX_MNT_ID_UNIQUE, &stx), 0); | ^~~~~~~~~ Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12selftests/mount_settattr: don't define sys_open_tree() twiceChristian Brauner
CC mount_setattr_test mount_setattr_test.c:176:19: error: redefinition of ‘sys_open_tree’ 176 | static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags) | ^~~~~~~~~~~~~ In file included from mount_setattr_test.c:23: ../filesystems/overlayfs/wrappers.h:59:19: note: previous definition of ‘sys_open_tree’ with type ‘int(int, const char *, unsigned int)’ 59 | static inline int sys_open_tree(int dfd, const char *filename, unsigned int flags) Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12crypto: lib/chacha - strongly type the ChaCha stateEric Biggers
The ChaCha state matrix is 16 32-bit words. Currently it is represented in the code as a raw u32 array, or even just a pointer to u32. This weak typing is error-prone. Instead, introduce struct chacha_state: struct chacha_state { u32 x[16]; }; Convert all ChaCha and HChaCha functions to use struct chacha_state. No functional changes. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-05-11selftests: fix some typos in tools/testing/selftestsChelsy Ratnawat
Fix multiple spelling errors: - "rougly" -> "roughly" - "fielesystems" -> "filesystems" - "Can'" -> "Can't" Link: https://lkml.kernel.org/r/20250503211959.507815-1-chelsyratnawat2001@gmail.com Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11lib/test_kmod: do not hardcode/depend on any filesystemHerton R. Krzesinski
Right now test_kmod has hardcoded dependencies on btrfs/xfs. That is not optimal since you end up needing to select/build them, but it is not really required since other fs could be selected for the testing. Also, we can't change the default/driver module used for testing on initialization. Thus make it more generic: introduce two module parameters (start_driver and start_test_fs), which allow to select which modules/fs to use for the testing on test_kmod initialization. Then it's up to the user to select which modules/fs to use for testing based on his config. However, keep test_module as required default. This way, config/modules becomes selectable as when the testing is done from selftests (userspace). While at it, also change trigger_config_run_type, since at module initialization we already set the defaults at __kmod_config_init and should not need to do it again in test_kmod_init(), thus we can avoid to again set test_driver/test_fs. Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com Signed-off-by: Herton R. Krzesinski <herton@redhat.com> Reviewed-by: Luis Chambelrain <mcgrof@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/damon: remove the remaining test scripts for DAMON debugfs interfaceEnze Li
DAMON has dropped debugfs support; therefore, remove these unused scripts. Link: https://lkml.kernel.org/r/20250411024332.1373861-1-enze.li@linux.dev Fixes: 5ec4333b1967 ("mm/damon: remove DAMON debugfs interface") Signed-off-by: Enze Li <lienze@kylinos.cn> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: restore default nr_hugepages value during cleanup in ↵Donet Tom
hugetlb_reparenting_test.sh During cleanup, the value of /proc/sys/vm/nr_hugepages is currently being set to 0. At the end of the test, if all tests pass, the original nr_hugepages value is restored. However, if any test fails, it remains set to 0. With this patch, we ensure that the original nr_hugepages value is restored during cleanup, regardless of whether the test passes or fails. Link: https://lkml.kernel.org/r/20250410100748.2310-1-donettom@linux.ibm.com Fixes: 29750f71a9b4 ("hugetlb_cgroup: add hugetlb_cgroup reservation tests") Signed-off-by: Donet Tom <donettom@linux.ibm.com> Cc: Li Wang <liwang@redhat.com> Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11maple_tree: add sufficient heightSidhartha Kumar
In order to support rebalancing and spanning stores using less than the worst case number of nodes, we need to track more than just the vacant height. Using only vacant height to reduce the worst case maple node allocation count can lead to a shortcoming of nodes in the following scenarios. For rebalancing writes, when a leaf node becomes insufficient, it may be combined with a sibling into a single node. This means that the parent node which has entries for this children will lose one entry. If this parent node was just meeting the minimum entries, losing one entry will now cause this parent node to be insufficient. This leads to a cascading operation of rebalancing at different levels and can lead to more node allocations than simply using vacant height can return. For spanning writes, a similar situation occurs. At the location at which a spanning write is detected, the number of ancestor nodes may similarly need to rebalanced into a smaller number of nodes and the same cascading situation could occur. To use less than the full height of the tree for the number of allocations, we also need to track the height at which a non-leaf node cannot become insufficient. This means even if a rebalance occurs to a child of this node, it currently has enough entries that it can lose one without any further action. This field is stored in the maple write state as sufficient height. In mas_prealloc_calc() when figuring out how many nodes to allocate, we check if the vacant node is lower in the tree than a sufficient node (has a larger value). If it is, we cannot use the vacant height and must use the difference in the height and sufficient height as the basis for the number of nodes needed. An off by one bug was also discovered in mast_overflow() where it is using >= rather than >. This caused extra iterations of the mas_spanning_rebalance() loop and lead to unneeded allocations. A test is also added to check the number of allocations is correct. Link: https://lkml.kernel.org/r/20250410191446.2474640-6-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11maple_tree: use vacant nodes to reduce worst case allocationsSidhartha Kumar
In order to determine the store type for a maple tree operation, a walk of the tree is done through mas_wr_walk(). This function descends the tree until a spanning write is detected or we reach a leaf node. While descending, keep track of the height at which we encounter a node with available space. This is done by checking if mas->end is less than the number of slots a given node type can fit. Now that the height of the vacant node is tracked, we can use the difference between the height of the tree and the height of the vacant node to know how many levels we will have to propagate creating new nodes. Update mas_prealloc_calc() to consider the vacant height and reduce the number of worst-case allocations. Rebalancing and spanning stores are not supported and fall back to using the full height of the tree for allocations. Update preallocation testing assertions to take into account vacant height. Link: https://lkml.kernel.org/r/20250410191446.2474640-4-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11maple_tree: use height and depth consistentlySidhartha Kumar
For the maple tree, the root node is defined to have a depth of 0 with a height of 1. Each level down from the node, these values are incremented by 1. Various code paths define a root with depth 1 which is inconsisent with the definition. Modify the code to be consistent with this definition. In mas_spanning_rebalance(), l_mas.depth was being used to track the height based on the number of iterations done in the main loop. This information was then used in mas_put_in_tree() to set the height. Rather than overload the l_mas.depth field to track height, simply keep track of height in the local variable new_height and directly pass this to mas_wmb_replace() which will be passed into mas_put_in_tree(). This allows up to remove writes to l_mas.depth. Link: https://lkml.kernel.org/r/20250410191446.2474640-3-sidhartha.kumar@oracle.com Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11tools/testing/selftests: assert that anon merge cases behave as expectedLorenzo Stoakes
Prior to the recently applied commit that permits this merge, mprotect()'ing a faulted VMA, adjacent to an unfaulted VMA, such that the two share characteristics would fail to merge due to what appear to be unintended consequences of commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"). Now we have fixed this bug, assert that we can indeed merge anonymous VMAs this way. Also assert that forked source/target VMAs are equally rejected. Previously, all empty target anon merges with one VMA faulted and the other unfaulted would be rejected incorrectly, now we ensure that unforked merge, but forked do not. Additionally, add the new test file to the MEMORY MAPPING section in MAINTAINERS, as these tests are explicitly memory mapping related. Link: https://lkml.kernel.org/r/2b69330274a3b71721f7042c5eabe91143934415.1744104124.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11tools/testing: add PROCMAP_QUERY helper functions in mm self testsLorenzo Stoakes
The PROCMAP_QUERY ioctl() is very useful - it allows for binary access to /proc/$pid/[s]maps data and thus convenient lookup of data contained there. This patch exposes this for convenient use by mm self tests so the state of VMAs can easily be queried. Link: https://lkml.kernel.org/r/ce83d877093d1fc594762cf4b82f0c27963030ee.1744104124.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Wei Yang <richard.weiyang@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm/vma: fix incorrectly disallowed anonymous VMA mergesLorenzo Stoakes
Patch series "fix incorrectly disallowed anonymous VMA merges", v2. It appears that we have been incorrectly rejecting merge cases for 15 years, apparently by mistake. Imagine a range of anonymous mapped momemory divided into two VMAs like this, with incompatible protection bits: RW RWX unfaulted faulted |-----------|-----------| | prev | vma | |-----------|-----------| mprotect(RW) Now imagine mprotect()'ing vma so it is RW. This appears as if it should merge, it does not. Neither does this case, again mprotect()'ing vma RW: RWX RW faulted unfaulted |-----------|-----------| | vma | next | |-----------|-----------| mprotect(RW) Nor: RW RWX RW unfaulted faulted unfaulted |-----------|-----------|-----------| | prev | vma | next | |-----------|-----------|-----------| mprotect(RW) What's going on here? In commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue"), from 2010, Rik von Riel took careful care to account for these cases - commenting that '[this is] easily overlooked: when mprotect shifts the boundary, make sure the expanding vma has anon_vma set if the shrinking vma had, to cover any anon pages imported.' However, commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs") introduced a little over a year later, appears to have accidentally disallowed this. By adjusting the is_mergeable_anon_vma() function to avoid lock contention across large trees of forked anon_vma's, this commit wrongly assumed the VMA being checked (the ostensible merge 'target') should be faulted, that is, have an anon_vma, and thus an anon_vma_chain list established, but only of length 1. This appears to have been unintentional, as disallowing empty target VMAs like this across the board makes no sense. We already have logic that accounts for this case, the same logic Rik introduced in 2010, now via dup_anon_vma() (and ultimately anon_vma_clone()), so there is no problem permitting this. This series fixes this mistake and also ensures that scalability concerns remain addressed by explicitly checking that whatever VMA is being merged has not been forked. A full set of self tests which reproduce the issue are provided, as well as updating userland VMA tests to assert this behaviour. The self tests additionally assert scalability concerns are addressed. This patch (of 3): anon_vma_chain's were introduced by Rik von Riel in commit 5beb49305251 ("mm: change anon_vma linking to fix multi-process server scalability issue"). This patch was introduced in March 2010. As part of this change, careful attention was made to the instance of mprotect() causing a VMA merge, with one faulted (i.e. having anon_vma set) and another not: /* * Easily overlooked: when mprotect shifts the boundary, * make sure the expanding vma has anon_vma set if the * shrinking vma had, to cover any anon pages imported. */ In the modern VMA code, this is handled in dup_anon_vma() (and ultimately anon_vma_clone()). This case is one of the three configurations of adjacent VMA anon_vma state that we might encounter on merge (where dst is the VMA which will be merged into and src the one being merged into dst): 1. dst->anon_vma, src->anon_vma - These must be equal, no-op. 2. dst->anon_vma, !src->anon_vma - We simply use dst->anon_vma, no-op. 3. !dst->anon_vma, src->anon_vma - The case in question here. In case 3, the instance addressed here - we duplicate the AVC connections from src and place into dst. However, in practice, we very often do NOT do this. This appears to be due to an inadvertent consequence of the change introduced by commit 965f55dea0e3 ("mmap: avoid merging cloned VMAs"), introduced in May 2011. This implies that this merge case was functional only for a little over a year, and has since been broken for ~15 years. Here, lock scalability concerns lead to us restricting anonymous merges only to those VMAs with 1 entry in their vma->anon_vma_chain, that is, a VMA that is not connected to any parent process's anon_vma. The mergeability test looks like this: static inline bool is_mergeable_anon_vma(struct anon_vma *anon_vma1, struct anon_vma *anon_vma2, struct vm_area_struct *vma) { if ((!anon_vma1 || !anon_vma2) && (!vma || !vma->anon_vma || list_is_singular(&vma->anon_vma_chain))) return true; return anon_vma1 == anon_vma2; } However, we have a problem here - typically the vma passed here is the destination VMA. For instance in vma_merge_existing_range() we invoke: can_vma_merge_left() -> [ check that there is an immediately adjacent prior VMA ] -> can_vma_merge_after() -> is_mergeable_vma() for general attribute check -> is_mergeable_anon_vma([ proposed anon_vma ], prev->anon_vma, prev) So if we were considering a target unfaulted 'prev': unfaulted faulted |-----------|-----------| | prev | vma | |-----------|-----------| This would call is_mergeable_anon_vma(NULL, vma->anon_vma, prev). The list_is_singular() check for vma->anon_vma_chain, an empty list on fault, would cause this merge to _fail_ even though all else indicates a merge. Equally a simple merge into a next VMA would hit the same problem: faulted unfaulted |-----------|-----------| | vma | next | |-----------|-----------| can_vma_merge_right() -> [ check that there is an immediately adjacent succeeding VMA ] -> can_vma_merge_before() -> is_mergeable_vma() for general attribute check -> is_mergeable_anon_vma([ proposed anon_vma ], next->anon_vma, next) For a 3-way merge, we'd also hit the same problem if it was configured like this for instance: unfaulted faulted unfaulted |-----------|-----------|-----------| | prev | vma | next | |-----------|-----------|-----------| As we'd call can_vma_merge_left() for prev, and can_vma_merge_right() for next, both of which would fail. vma_merge_new_range() (and relatedly, vma_expand()) are not impacted, as the new VMA would never already be faulted (it is a proposed new range). Because we already handle each of the aforementioned merge cases, and can absolutely therefore deal with an existing VMA merge with !dst->anon_vma, src->anon_vma, there is absolutely no reason to disallow this kind of merge. It seems that the intention of this patch is to ensure that, in the instance of merging unfaulted VMAs with faulted ones, we never wish to do so with those with multiple AVCs due to the fact that anon_vma lock's are held across both parent and child anon_vma's (actually, the 'root' parent anon_vma's lock is used). In fact, the original commit alludes to this - "find_mergeable_anon_vma() already considers this case". In find_mergeable_anon_vma() however, we check the anon_vma which will be merged from, if it is set, then we check list_is_singular(vma->anon_vma_chain). So to match this logic, update is_mergeable_anon_vma() to perform this scalability check on the VMA whose anon_vma we ultimately merge into. This matches existing behaviour with forked VMAs, only we no longer wrongly disallow ALL empty target merges. So we both allow merge cases and ensure the scalability check is correctly applied. We may wish to revisit these lock scalability concerns at a later date and ensure they are still valid. Additionally, correct userland VMA tests which were mistakenly not asserting these cases correctly previously to now correctly assert this, and to ensure vmg->anon_vma state is always consistent to account for newly introduced asserts. Link: https://lkml.kernel.org/r/cover.1744104124.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/18c756fc9eaf7ad082a710c91133b8346f8cd9a8.1744104124.git.lorenzo.stoakes@oracle.com Fixes: 965f55dea0e3 ("mmap: avoid merging cloned VMAs") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftest/mm: make hugetlb_reparenting_test tolerant to async reparentingLi Wang
In cgroup v2, memory and hugetlb usage reparenting is asynchronous. This can cause test flakiness when immediately asserting usage after deleting a child cgroup. To address this, add a helper function `assert_with_retry()` that checks usage values with a timeout-based retry. This improves test stability without relying on fixed sleep delays. Also bump up the tolerance size to 7MB. To avoid False Positives: ... # Assert memory charged correctly for child only use. # actual a = 11 MB # expected a = 0 MB # fail # cleanup # [FAIL] not ok 11 hugetlb_reparenting_test.sh -cgroup-v2 # exit=1 # 0 # SUMMARY: PASS=10 SKIP=0 FAIL=1 Link: https://lkml.kernel.org/r/20250407084201.74492-1-liwang@redhat.com Signed-off-by: Li Wang <liwang@redhat.com> Tested-by: Donet Tom <donettom@linux.ibm.com> Cc: Waiman Long <longman@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: add PAGEMAP_SCAN guard region testAndrei Vagin
Add a selftest to verify the PAGEMAP_SCAN ioctl correctly reports guard regions using the newly introduced PAGE_IS_GUARD flag. Link: https://lkml.kernel.org/r/20250324065328.107678-4-avagin@google.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11tools headers UAPI: sync linux/fs.h with the kernel sourcesAndrei Vagin
Required for a new PAGEMAP_SCAN test to verify guard region reporting. Link: https://lkml.kernel.org/r/20250324065328.107678-3-avagin@google.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFODmitry V. Levin
Check whether PTRACE_SET_SYSCALL_INFO semantics implemented in the kernel matches userspace expectations. Link: https://lkml.kernel.org/r/20250303112052.GG24170@strace.io Signed-off-by: Dmitry V. Levin <ldv@strace.io> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexey Gladkov (Intel) <legion@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: anton ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Betkov <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Charlie Jenkins <charlie@rivosinc.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christian Zankel <chris@zankel.net> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davide Berardi <berardi.dav@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Eugene Syromiatnikov <esyr@redhat.com> Cc: Eugene Syromyatnikov <evgsyr@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonas Bonn <jonas@southpole.se> Cc: Maciej W. Rozycki <macro@orcam.me.uk> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Renzo Davoi <renzo@cs.unibo.it> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russel King <linux@armlinux.org.uk> Cc: Shuah Khan <shuah@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11selftests/mm: convert page_size to unsigned longSiddarth G
Cppcheck warning: int result is assigned to long long variable. If the variable is long long to avoid loss of information, then you have loss of information. This patch changes the type of page_size from 'unsigned int' to 'unsigned long' instead of using ULL suffixes. Changing hpage_size to 'unsigned long' was considered, but since gethugepage() expects an int, this change was avoided. Link: https://lkml.kernel.org/r/20250403101345.29226-1-siddarthsgml@gmail.com Signed-off-by: Siddarth G <siddarthsgml@gmail.com> Reported-by: David Binderman <dcb314@hotmail.com> Closes: https://lore.kernel.org/all/AS8PR02MB10217315060BBFDB21F19643E9CA62@AS8PR02MB10217.eurprd02.prod.outlook.com/ Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11Merge tag 'its-for-linus-20250509' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 ITS mitigation from Dave Hansen: "Mitigate Indirect Target Selection (ITS) issue. I'd describe this one as a good old CPU bug where the behavior is _obviously_ wrong, but since it just results in bad predictions it wasn't wrong enough to notice. Well, the researchers noticed and also realized that thus bug undermined a bunch of existing indirect branch mitigations. Thus the unusually wide impact on this one. Details: ITS is a bug in some Intel CPUs that affects indirect branches including RETs in the first half of a cacheline. Due to ITS such branches may get wrongly predicted to a target of (direct or indirect) branch that is located in the second half of a cacheline. Researchers at VUSec found this behavior and reported to Intel. Affected processors: - Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet Lake, Ice Lake, Tiger Lake and Rocket Lake. Scope of impact: - Guest/host isolation: When eIBRS is used for guest/host isolation, the indirect branches in the VMM may still be predicted with targets corresponding to direct branches in the guest. - Intra-mode using cBPF: cBPF can be used to poison the branch history to exploit ITS. Realigning the indirect branches and RETs mitigates this attack vector. - User/kernel: With eIBRS enabled user/kernel isolation is *not* impacted by ITS. - Indirect Branch Prediction Barrier (IBPB): Due to this bug indirect branches may be predicted with targets corresponding to direct branches which were executed prior to IBPB. This will be fixed in the microcode. Mitigation: As indirect branches in the first half of cacheline are affected, the mitigation is to replace those indirect branches with a call to thunk that is aligned to the second half of the cacheline. RETs that take prediction from RSB are not affected, but they may be affected by RSB-underflow condition. So, RETs in the first half of cacheline are also patched to a return thunk that executes the RET aligned to second half of cacheline" * tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftest/x86/bugs: Add selftests for ITS x86/its: FineIBT-paranoid vs ITS x86/its: Use dynamic thunks for indirect branches x86/ibt: Keep IBT disabled during alternative patching mm/execmem: Unify early execmem_cache behaviour x86/its: Align RETs in BHB clear sequence to avoid thunking x86/its: Add support for RSB stuffing mitigation x86/its: Add "vmexit" option to skip mitigation on some CPUs x86/its: Enable Indirect Target Selection mitigation x86/its: Add support for ITS-safe return thunk x86/its: Add support for ITS-safe indirect thunk x86/its: Enumerate Indirect Target Selection (ITS) bug Documentation: x86/bugs/its: Add ITS documentation
2025-05-11Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM: - Avoid use of uninitialized memcache pointer in user_mem_abort() - Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) - Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM - Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value - Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page RISC-V: - Add missing reset of smstateen CSRs x86: - Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to sanitize the VMCB as its state is undefined after SHUTDOWN, emulating INIT is the least awful choice). - Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM KVM doesn't goof a sanity check in the future. - Free obsolete roots when (re)loading the MMU to fix a bug where pre-faulting memory can get stuck due to always encountering a stale root. - When dumping GHCB state, use KVM's snapshot instead of the raw GHCB page to print state, so that KVM doesn't print stale/wrong information. - When changing memory attributes (e.g. shared <=> private), add potential hugepage ranges to the mmu_invalidate_range_{start,end} set so that KVM doesn't create a shared/private hugepage when the the corresponding attributes will become mixed (the attributes are commited *after* KVM finishes the invalidation). - Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is loaded led to very measurable performance regressions for non-KVM workloads" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions KVM: arm64: Fix memory check in host_stage2_set_owner_locked() KVM: arm64: Kill HCRX_HOST_FLAGS KVM: arm64: Properly save/restore HCRX_EL2 KVM: arm64: selftest: Don't try to disable AArch64 support KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort() KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields KVM: RISC-V: reset smstateen CSRs KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload() KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run() KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception
2025-05-10Merge tag 'mm-hotfixes-stable-2025-05-10-14-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. 13 are cc:stable and the remainder address post-6.14 issues or aren't considered necessary for -stable kernels. About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates" * tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) mm: fix folio_pte_batch() on XEN PV nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs() mm/hugetlb: copy the CMA flag when demoting mm, swap: fix false warning for large allocation with !THP_SWAP selftests/mm: fix a build failure on powerpc selftests/mm: fix build break when compiling pkey_util.c mm: vmalloc: support more granular vrealloc() sizing tools/testing/selftests: fix guard region test tmpfs assumption ocfs2: stop quota recovery before disabling quotas ocfs2: implement handshaking with ocfs2 recovery thread ocfs2: switch osb->disable_recovery to enum mailmap: map Uwe's BayLibre addresses to a single one MAINTAINERS: add mm THP section mm/userfaultfd: fix uninitialized output field for -EAGAIN race selftests/mm: compaction_test: support platform with huge mount of memory MAINTAINERS: add core mm section ocfs2: fix panic in failed foilio allocation mm/huge_memory: fix dereferencing invalid pmd migration entry MAINTAINERS: add reverse mapping section x86: disable image size check for test builds ...
2025-05-10Merge tag 'kvmarm-fixes-6.15-3' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.15, round #3 - Avoid use of uninitialized memcache pointer in user_mem_abort() - Always set HCR_EL2.xMO bits when running in VHE, allowing interrupts to be taken while TGE=0 and fixing an ugly bug on AmpereOne that occurs when taking an interrupt while clearing the xMO bits (AC03_CPU_36) - Prevent VMMs from hiding support for AArch64 at any EL virtualized by KVM - Save/restore the host value for HCRX_EL2 instead of restoring an incorrect fixed value - Make host_stage2_set_owner_locked() check that the entire requested range is memory rather than just the first page
2025-05-09selftests/bpf: Add test to cover sockmap with ktlsJiayuan Chen
The selftest can reproduce an issue where we miss the uncharge operation when freeing msg, which will cause the following warning. We fixed the issue and added this reproducer to selftest to ensure it will not happen again. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct+0x173/0x1d5 Tainted: [W]=WARN Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: events sk_psock_destroy RIP: 0010:inet_sock_destruct+0x173/0x1d5 RSP: 0018:ffff8880085cfc18 EFLAGS: 00010202 RAX: 1ffff11003dbfc00 RBX: ffff88801edfe3e8 RCX: ffffffff822f5af4 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88801edfe16c RBP: ffff88801edfe184 R08: ffffed1003dbfc31 R09: 0000000000000000 R10: ffffffff822f5ab7 R11: ffff88801edfe187 R12: ffff88801edfdec0 R13: ffff888020376ac0 R14: ffff888020376ac0 R15: ffff888020376a60 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000556365155830 CR3: 000000001d6aa000 CR4: 0000000000350ef0 Call Trace: <TASK> __sk_destruct+0x46/0x222 sk_psock_destroy+0x22f/0x242 process_one_work+0x504/0x8a8 ? process_one_work+0x39d/0x8a8 ? __pfx_process_one_work+0x10/0x10 ? worker_thread+0x44/0x2ae ? __list_add_valid_or_report+0x83/0xea ? srso_return_thunk+0x5/0x5f ? __list_add+0x45/0x52 process_scheduled_works+0x73/0x82 worker_thread+0x1ce/0x2ae Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20250425060015.6968-3-jiayuan.chen@linux.dev
2025-05-09tools: ynl: handle broken pipe gracefully in CLIDonald Hunter
When sending YNL CLI output into a pipe, closing the pipe causes a BrokenPipeError. E.g. running the following and quitting less: ./tools/net/ynl/pyynl/cli.py --family rt-link --dump getlink | less Traceback (most recent call last): File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 160, in <module> main() ~~~~^^ File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 142, in main output(reply) ~~~~~~^^^^^^^ File "/home/donaldh/net-next/./tools/net/ynl/pyynl/cli.py", line 97, in output pprint.PrettyPrinter().pprint(msg) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^ [...] BrokenPipeError: [Errno 32] Broken pipe Consolidate the try block for ops and notifications, and gracefully handle the BrokenPipeError by adding an exception handler to the consolidated try block. Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250508112102.63539-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09tests/ncdevmem: Fix double-free of queue arrayCosmin Ratiu
netdev_bind_rx takes ownership of the queue array passed as parameter and frees it, so a queue array buffer cannot be reused across multiple netdev_bind_rx calls. This commit fixes that by always passing in a newly created queue array to all netdev_bind_rx calls in ncdevmem. Fixes: 85585b4bc8d8 ("selftests: add ncdevmem, netcat for devmem TCP") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20250508084434.1933069-1-cratiu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-09tools/net/ynl: ethtool: fix crash when Hardware Clock info is missingHangbin Liu
Fix a crash in the ethtool YNL implementation when Hardware Clock information is not present in the response. This ensures graceful handling of devices or drivers that do not provide this optional field. e.g. Traceback (most recent call last): File "/net/tools/net/ynl/pyynl/./ethtool.py", line 438, in <module> main() ~~~~^^ File "/net/tools/net/ynl/pyynl/./ethtool.py", line 341, in main print(f'PTP Hardware Clock: {tsinfo["phc-index"]}') ~~~~~~^^^^^^^^^^^^^ KeyError: 'phc-index' Fixes: f3d07b02b2b8 ("tools: ynl: ethtool testing tool") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250508035414.82974-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>