summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-05-27net: devmem: ksft: add ipv4 supportMina Almasry
ncdevmem supports both ipv4 and ipv6, but the ksft is currently ipv6-only. Propagate the ipv4 support to the ksft, so that folks that are limited to these networks can also test. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250523230524.1107879-5-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-27selftests: netfilter: nft_queue.sh: include file transfer duration in log ↵Florian Westphal
message Paolo Abeni says: Recently the nipa CI infra went through some tuning, and the mentioned self-test now often fails. The failing test is the sctp+nfqueue one, where the file transfer takes too long and hits the timeout (1 minute). Because SCTP nfqueue tests had timeout related issues before (esp. on debug kernels) print the file transfer duration in the PASS/FAIL message. This would aallow us to see if there is/was an unexpected slowdown (CI keeps logs around) or 'creeping slowdown' where things got slower over time until 'fail point' was reached. Output of altered lines looks like this: PASS: tcp and nfqueue in forward chan (duration: 2s) PASS: tcp via loopback (duration: 2s) PASS: sctp and nfqueue in forward chain (duration: 42s) PASS: sctp and nfqueue in output chain with GSO (duration: 21s) Reported-by: Paolo Abeni <pabeni@redhat.com Closes: https://lore.kernel.org/netdev/584524ef-9fd7-4326-9f1b-693ca62c5692@redhat.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20250523121700.20011-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-27selftests: net: move wait_local_port_listen to lib.shHangbin Liu
The function wait_local_port_listen() is the only function defined in net_helper.sh. Since some tests source both lib.sh and net_helper.sh, we can simplify the setup by moving wait_local_port_listen() to lib.sh. With this change, net_helper.sh becomes redundant and can be removed. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250526014600.9128-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-27tools: ynl: parse extack for sub-messagesDonald Hunter
Extend the Python YNL extack decoding to handle sub-messages in the same way that YNL C does. This involves retaining the input values so that they are available during extack decoding. ./tools/net/ynl/pyynl/cli.py --family rt-link --do newlink --create \ --json '{ "linkinfo": {"kind": "netkit", "data": {"policy": 10} } }' Netlink error: Invalid argument nl_len = 92 (76) nl_flags = 0x300 nl_type = 2 error: -22 extack: {'msg': 'Provided default xmit policy not supported', 'bad-attr': '.linkinfo.data(netkit).policy'} Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20250523103031.80236-1-donald.hunter@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-27Merge tag 'pm-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, the changes are dominated by cpufreq updates, but this time the majority of them are cpufreq core changes, mostly related to the introduction of policy locking guards and __free() usage, and fixes related to boost handling. Still, there is also a significant update of the intel_pstate driver making it register an energy model when running on a hybrid platform which is used for enabling energy-aware scheduling (EAS) if the driver operates in the passive mode (and schedutil is used as the cpufreq governor for all CPUs which is the passive mode default). There are some amd-pstate driver updates too, for a good measure, including the "Requested CPU Min frequency" BIOS option support and new online/offline callbacks. In the cpuidle space, the most significant change is the addition of a C1 demotion on/off sysfs knob to intel_idle which should help some users to configure their systems more precisely. There is also the conversion of the PSCI cpuidle driver to a faux device one and there are two small updates of cpuidle governors. Device power management is also modified quite a bit, especially the handling of devices with asynchronous suspend and resume enabled during system transitions. They are now going to be handled more asynchronously during suspend transitions and somewhat less aggressively during resume transitions. Apart from the above, the operating performance points (OPP) library is now going to use mutex locking guards and scope-based cleanup helpers and there is the usual bunch of assorted fixes and code cleanups. Specifics: - Fix potential division-by-zero error in em_compute_costs() (Yaxiong Tian) - Fix typos in energy model documentation and example driver code (Moon Hee Lee, Atul Kumar Pant) - Rearrange the energy model management code and add a new function for adjusting a CPU energy model after adjusting the capacity of the given CPU to it (Rafael Wysocki) - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki) - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar) - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar) - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar) - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar) - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal) - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor) - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki) - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri) - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki) - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab) - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu) - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng) - OPP: Add dev_pm_opp_set_level() (Praveen Talari) - Introduce scope-based cleanup headers and mutex locking guards in OPP core (Viresh Kumar) - Switch OPP to use kmemdup_array() (Zhang Enpei) - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the menu cpuidle governor (Zhongqiu Han) - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla) - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem Bityutskiy) - Fix typos in two comments in the teo cpuidle governor (Atul Kumar Pant) - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla) - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki) - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás) - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum) - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki) - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel) - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang) - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu) - Add configurable pm_test delay for hibernation (Zihuan Zhang) - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter) - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki) - Add a systemd service to run cpupower and change cpupower binding's Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)" * tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number OPP: switch to use kmemdup_array() PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpupower: do not install files to /etc/default/ cpupower: do not call systemctl at install time cpupower: do not write DESTDIR to cpupower.service PM: sleep: Introduce pm_sleep_transition_in_progress() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms PM: EM: Introduce em_adjust_cpu_capacity() PM: EM: Move CPU capacity check to em_adjust_new_capacity() PM: EM: Documentation: Fix typos in example driver code cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() PM: sleep: Introduce pm_suspend_in_progress() ...
2025-05-27Merge tag 'acpi-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "The most significant part of these changes is an ACPICA update covering two upstream ACPICA releases, 20241212 and 20250404, that have not been included into the kernel code base yet. Among other things, it adds definitions needed to address GCC 15's -Wunterminated-string-initialization warnings, adds support for three new tables (MRRM, ERDT, RIMT), extends support for two tables (RAS2, DMAR), and fixes some issues. On top of the above, there is a new parser for the MRRM table, more changes related to GCC 15's -Wunterminated-string-initialization warnings, a CPPC library update including functions related to autonomous CPU performance state selection, a couple of new quirks, some assorted fixes and some code cleanups. Specifics: - Fix two ACPICA SLAB cache leaks (Seunghun Han) - Add EINJv2 get error type action and define Error Injection Actions in hex values to avoid inconsistencies between the specification and the code (Zaid Alali) - Fix typo in comments for SRAT structures (Adam Lackorzynski) - Prevent possible loss of data in ACPICA because of u32 to u8 conversions (Saket Dumbre) - Fix reading FFixedHW operation regions in ACPICA (Daniil Tatianin) - Add support for printing AML arguments when the ACPICA debug level is ACPI_LV_TRACE_POINT (Mario Limonciello) - Drop a stale comment about the file content from actbl2.h (Sudeep Holla) - Apply pack(1) to union aml_resource (Tamir Duberstein) - Fix overflow check in the ACPICA version of vsnprintf() (gldrk) - Interpret SIDP structures in DMAR added revision 3.4 of the VT-d specification (Alexey Neyman) - Add typedef and other definitions related to MRRM to ACPICA (Tony Luck) - Add definitions for RIMT to ACPICA (Sunil V L) - Fix spelling mistake "Incremement" -> "Increment" in the ACPICA utilities code (Colin Ian King) - Add typedef and other definitions for ERDT to ACPICA (Tony Luck) - Introduce ACPI_NONSTRING and use it (Kees Cook, Ahmed Salem) - Rename structure and field names of the RAS2 table in actbl2.h (Shiju Jose) - Fix up whitespace in acpica/utcache.c (Zhe Qiao) - Avoid sequence overread in a call to strncmp() in ap_get_table_length() and replace strncpy() with memcpy() in ACPICA in some places (Ahmed Salem) - Update copyright year in all ACPICA files (Saket Dumbre) - Add __nonstring annotations for unterminated strings in the static ACPI tables parsing code (Kees Cook) - Add support for parsing the MRRM ACPI table and sysfs files to describe memory regions listed in it (Tony Luck, Anil Keshavamurthy) - Remove an (explicitly) unused header file include from the VIOT ACPI table parser file (Andy Shevchenko) - Improve logging around acpi_initialize_tables() (Bartosz Szczepanek) - Clean up the initialization of CPU data structures in the ACPI processor driver (Zhang Rui) - Remove an obsolete comment regarding the C-states handling in the ACPI processor driver (Giovanni Gherdovich) - Simplify PCC shared memory region handling (Sudeep Holla) - Rework and extend functions for reading CPPC register values and for updating CPPC registers (Lifeng Zheng) - Add three functions related to autonomous CPU performance state selection to the CPPC library (Lifeng Zheng) - Turn the acpi_pci_root_remap_iospace() fwnode_handle parameter into a const pointer (Pei Xiao) - Round battery capacity percengate in the ACPI battery driver to the closest integer to avoid user confusion (shitao) - Make the ACPI battery driver report the current as a negative number to the power supply framework when the battery is discharging as documented (Peter Marheine) - Add TUXEDO InfinityBook Pro AMD Gen9 to the acpi_ec_no_wakeup[] list to prevent spurious wakeups from suspend-to-idle (Werner Sembach) - Convert the APEI EINJ driver to a faux device one (Sudeep Holla, Jon Hunter) - Remove redundant calls to einj_get_available_error_type() from the APEI EINJ driver (Zaid Alali) - Fix a typo for MECHREVO in irq1_edge_low_force_override[] (Mingcong Bai) - Add an LPS0 check() callback to the AMD pinctrl driver and fix up config symbol dependencies in it (Mario Limonciello, Rafael Wysocki) - Avoid initializing the ACPI platform profile driver on non-ACPI platforms (Alexandre Ghiti) - Document that references to ACPI data (non-device) nodes should use string-only references in hierarchical data node packages (Sakari Ailus) - Fail the ACPI bus registration if acpi_kobj registration fails (Armin Wolf)" * tag 'acpi-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) ACPI: MRRM: Fix default max memory region ACPI: bus: Bail out if acpi_kobj registration fails ACPI: platform_profile: Avoid initializing on non-ACPI platforms pinctrl: amd: Fix hibernation support with CONFIG_SUSPEND unset ACPI: tables: Improve logging around acpi_initialize_tables() ACPI: VIOT: Remove (explicitly) unused header ACPI: Add documentation for exposing MRRM data ACPI: MRRM: Add /sys files to describe memory ranges ACPI: MRRM: Minimal parse of ACPI MRRM table ACPICA: Update copyright year ACPICA: Logfile: Changes for version 20250404 ACPICA: Replace strncpy() with memcpy() ACPICA: Apply ACPI_NONSTRING in more places ACPICA: Avoid sequence overread in call to strncmp() ACPICA: Adjust the position of code lines ACPICA: actbl2.h: ACPI 6.5: RAS2: Rename structure and field names of the RAS2 table ACPICA: Apply ACPI_NONSTRING ACPICA: Introduce ACPI_NONSTRING ACPICA: actbl2.h: ERDT: Add typedef and other definitions ACPICA: infrastructure: Add new DMT_BUF types and shorten a long name ...
2025-05-27Merge tag 'gpio-updates-for-v6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We have three new drivers, some refactoring in the GPIO core, lots of various changes across many drivers, new configfs interface for the virtual gpio-aggregator module and DT-bindings updates. The treewide conversion of GPIO drivers to using the new value setter callbacks is ongoing with another round of GPIO drivers updated. You will also see these commits coming in from other subsystems as with the relevant changes merged into mainline last cycle, I've started converting GPIO providers located elsewhere than drivers/gpio/. GPIO core: - use more lock guards where applicable - refactor GPIO ACPI code and shrink it in the process by 8% - move GPIO ACPI quirks into a separate file - remove unneeded #ifdef - convert GPIO devres helpers to using devm_add_action() where applicable which shrinks and simplifies the code - refactor GPIO descriptor validation in GPIO consumer interfaces - don't allow setting values on input lines in the GPIO core which will take off the burden from GPIO drivers of checking this down the line - provide gpiod_is_equal() as a way of safely comparing two GPIO descriptors (the only current user is in regulator core) New drivers: - add the GPIO module for the max77759 multifunction device - add the GPIO driver for the VeriSilicon BLZP1600 GPIO controller - add the GPIO driver for the Spacemit K1 SoC Driver improvements: - convert more drivers to using the new GPIO line value setter callbacks - convert more drivers to making the irq_chip immutable as is recommended by the interrupt subsystem - extend build testing coverage by enabling more modules to be built with COMPILE_TEST=y - extend the gpio-aggregator module with a configfs interface that makes the setup easier for user-space than the existing driver-level sysfs attributes and also adds more advanced configuration features (such as referring to aggregated lines by their original names or modifying their names as exposed by the aggregated chip) - add a missing mutex_destroy() in gpio-imx-scu - add an OF polarity quirk for s5m8767 - allow building gpio-vf610 as a loadable module - make gpio-mxc not hardcode its GPIO base number with GPIO SYSFS interface disabled (another small step towards getting rid of the global GPIO numberspace) - add support for level-triggered interrupts to gpio-pca953x - don't double-check the ngpios property in gpio-ds4520 as GPIO core already does it - don't double-check the number of GPIOs in gpio-imx-scu as GPIO core already does it - remove unused callbacks from gpio-max3191x DT bindings: - add device-tree bindings for max77759, spacemit,k1 and blzp1600 (new drivers added this cycle) - document more properties for gpio-vf610 and gpio-tegra186 - document a new pca95xx variant - fix style of examples in several GPIO DT-binding documents Misc: - TODO list updates" * tag 'gpio-updates-for-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (123 commits) gpio: timberdale: select GPIOLIB_IRQCHIP gpio: lpc18xx: select GPIOLIB_IRQCHIP gpio: grgpio: select GPIOLIB_IRQCHIP gpio: bcm-kona: select GPIOLIB_IRQCHIP dt-bindings: gpio: vf610: add ngpios and gpio-reserved-ranges gpio: davinci: select GPIOLIB_IRQCHIP gpiolib-acpi: Update file references in the Documentation and MAINTAINERS gpiolib: acpi: Move quirks to a separate file gpiolib: acpi: Add acpi_gpio_need_run_edge_events_on_boot() getter gpiolib: acpi: Handle deferred list via new API gpiolib: acpi: Make sure we fill struct acpi_gpio_info gpiolib: acpi: Switch to use enum in acpi_gpio_in_ignore_list() gpiolib: acpi: Use temporary variable for struct acpi_gpio_info gpiolib: remove unneeded #ifdef gpio: mpc8xxx: select GPIOLIB_IRQCHIP gpio: pxa: select GPIOLIB_IRQCHIP gpio: pxa: Make irq_chip immutable gpio: timberdale: Make irq_chip immutable gpio: xgene-sb: Make irq_chip immutable gpio: davinci: Make irq_chip immutable ...
2025-05-27selftests/bpf: Add tests with stack ptr register in conditional jmpYonghong Song
Add two tests: - one test has 'rX <op> r10' where rX is not r10, and - another test has 'rX <op> rY' where rX and rY are not r10 but there is an early insn 'rX = r10'. Without previous verifier change, both tests will fail. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250524041340.4046304-1-yonghong.song@linux.dev
2025-05-27perf test: Add AMD IBS sw filter testNamhyung Kim
The kernel v6.14 added 'swfilt' to support privilege filtering in software so that IBS can be used by regular users. Add a test case in x86 to verify the behavior. $ sudo perf test -vv 'IBS software filter' 113: AMD IBS software filtering: --- start --- test child forked, pid 178826 check availability of IBS swfilt run perf record with modifier and swfilt [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.000 MB /dev/null ] [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.000 MB /dev/null ] [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.000 MB /dev/null ] [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.000 MB /dev/null ] check number of samples with swfilt [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.037 MB - ] [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.041 MB - ] ---- end(0) ---- 113: AMD IBS software filtering : Ok Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> # On a 9950x3d Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20250524002754.1266681-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-27perf mem: Count L2 HITM for c2c statisticYicong Yang
L2 HITM is not counted in c2c statistic decoding. Count it for lcl_hitm like how we handle L2 Peer snoop. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: CaiJingtao <caijingtao@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Yushan Wang <wangyushan12@huawei.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Cc: xueshan2@huawei.com Link: https://lore.kernel.org/r/20250425033845.57671-4-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-27perf arm-spe: Add support for SPE Data Source packet on HiSilicon HIP12Yicong Yang
Add data source encoding for HiSilicon HIP12 and coresponding mapping to the perf's memory data source. This will help to synthesize the data and support upper layer tools like perf-mem and perf-c2c. Reviewed-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: CaiJingtao <caijingtao@huawei.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Yushan Wang <wangyushan12@huawei.com> Cc: Zeng Tao <prime.zeng@hisilicon.com> Cc: xueshan2@huawei.com Link: https://lore.kernel.org/r/20250425033845.57671-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2025-05-27Merge tag 'nolibc-20250526-for-6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc Pull nolibc updates from Thomas Weißschuh: - New supported architectures: m68k, SPARC (32 and 64 bit) - Compatibility with kselftest_harness.h - A more robust mechanism to include all of nolibc from each header - Split existing features into new headers to simplify adoption - Compatibility with UBSAN and it is used in the testsuite - Many small new features focussing on usage in kselftests * tag 'nolibc-20250526-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (83 commits) selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Add "variant" and "self" to test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Move teardown conditional into test metadata selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Implement test timeouts through pidfd selftests: harness: Remove dependency on libatomic selftests: harness: Remove inline qualifier for wrappers selftests: harness: Mark functions without prototypes static selftests: harness: Ignore unused variant argument warning selftests: harness: Use C89 comment style selftests: harness: Add kselftest harness selftest selftests/nolibc: drop include guards around standard headers tools/nolibc: move NULL and offsetof() to sys/stddef.h tools/nolibc: move uname() and friends to sys/utsname.h tools/nolibc: move makedev() and friends to sys/sysmacros.h tools/nolibc: move getrlimit() and friends to sys/resource.h tools/nolibc: move reboot() to sys/reboot.h tools/nolibc: move prctl() to sys/prctl.h tools/nolibc: move mount() to sys/mount.h ...
2025-05-27Merge tag 'docs-6.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...
2025-05-27Merge tag 'lkmm.2025.05.25a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull lkmm updates from Paul McKenney: "Update LKMM documentation: - Cross-references, typos, broken URLs (Akira Yokosawa) - Clarify SRCU explanation (Uladzislau Rezki)" * tag 'lkmm.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model/Documentation: Fix SRCU section in explanation.txt tools/memory-model: docs/references: Remove broken link to imgtec.com tools/memory-model: docs/ordering: Fix trivial typos tools/memory-model: docs/simple.txt: Fix trivial typos tools/memory-model: docs/README: Update introduction of locking.txt
2025-05-27selftests/bpf: enable many-args tests for arm64Alexis Lothoré (eBPF Foundation)
Now that support for up to 12 args is enabled for tracing programs on ARM64, enable the existing tests for this feature on this architecture. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-2-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpftool: Add support for custom BTF path in prog load/loadallJiayuan Chen
This patch exposes the btf_custom_path feature to bpftool, allowing users to specify a custom BTF file when loading BPF programs using prog load or prog loadall commands. The argument 'btf_custom_path' in libbpf is used for those kernels that don't have CONFIG_DEBUG_INFO_BTF enabled but still want to perform CO-RE relocations. Suggested-by: Quentin Monnet <qmo@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250516144708.298652-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add unit tests with __bpf_trap() kfuncYonghong Song
Add some inline-asm tests and C tests where __bpf_trap() or __builtin_trap() is used in the code. The __builtin_trap() test is guarded with llvm21 ([1]) since otherwise the compilation failure will happen. [1] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for open coded dmabuf_iterT.J. Mercier
Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for dmabuf_iterT.J. Mercier
This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests changes for 6.16: - Add support for SNP to the various SEV selftests. - Add a selftest to verify fastops instructions via forced emulation. - Add MGLRU support to the access tracking perf test.
2025-05-27Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.16: - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. - Advertise support to userspace for WRMSRNS and PREFETCHI. - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. - Add a module param to control and enumerate support for device posted interrupts. - Misc cleanups.
2025-05-27vsock/test: Add test for an unexpectedly lingering close()Michal Luczaj
There was an issue with SO_LINGER: instead of blocking until all queued messages for the socket have been successfully sent (or the linger timeout has been reached), close() would block until packets were handled by the peer. Add a test to alert on close() lingering when it should not. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-5-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/test: Introduce enable_so_linger() helperMichal Luczaj
Add a helper function that sets SO_LINGER. Adapt the caller. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-4-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27vsock/test: Introduce vsock_wait_sent() helperMichal Luczaj
Distill the virtio_vsock_sock::bytes_unsent checking loop (ioctl SIOCOUTQ) and move it to utils. Tweak the comment. Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20250522-vsock-linger-v6-3-2ad00b0e447e@rbox.co Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: selftests: specify -std=gnu17 for bashJason A. Donenfeld
GCC 15 defaults to C23, which bash can't compile under, so specify gnu17 explicitly. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-6-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: allowedips: add WGALLOWEDIP_F_REMOVE_ME flagJordan Rife
The current netlink API for WireGuard does not directly support removal of allowed ips from a peer. A user can remove an allowed ip from a peer in one of two ways: 1. By using the WGPEER_F_REPLACE_ALLOWEDIPS flag and providing a new list of allowed ips which omits the allowed ip that is to be removed. 2. By reassigning an allowed ip to a "dummy" peer then removing that peer with WGPEER_F_REMOVE_ME. With the first approach, the driver completely rebuilds the allowed ip list for a peer. If my current configuration is such that a peer has allowed ips 192.168.0.2 and 192.168.0.3 and I want to remove 192.168.0.2 the actual transition looks like this. [192.168.0.2, 192.168.0.3] <-- Initial state [] <-- Step 1: Allowed ips removed for peer [192.168.0.3] <-- Step 2: Allowed ips added back for peer This is true even if the allowed ip list is small and the update does not need to be batched into multiple WG_CMD_SET_DEVICE requests, as the removal and subsequent addition of ips is non-atomic within a single request. Consequently, wg_allowedips_lookup_dst and wg_allowedips_lookup_src may return NULL while reconfiguring a peer even for packets bound for ips a user did not intend to remove leading to unintended interruptions in connectivity. This presents in userspace as failed calls to sendto and sendmsg for UDP sockets. In my case, I ran netperf while repeatedly reconfiguring the allowed ips for a peer with wg. /usr/local/bin/netperf -H 10.102.73.72 -l 10m -t UDP_STREAM -- -R 1 -m 1024 send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host While this may not be of particular concern for environments where peers and allowed ips are mostly static, systems like Cilium manage peers and allowed ips in a dynamic environment where peers (i.e. Kubernetes nodes) and allowed ips (i.e. pods running on those nodes) can frequently change making WGPEER_F_REPLACE_ALLOWEDIPS problematic. The second approach avoids any possible connectivity interruptions but is hacky and less direct, requiring the creation of a temporary peer just to dispose of an allowed ip. Introduce a new flag called WGALLOWEDIP_F_REMOVE_ME which in the same way that WGPEER_F_REMOVE_ME allows a user to remove a single peer from a WireGuard device's configuration allows a user to remove an ip from a peer's set of allowed ips. This enables incremental updates to a device's configuration without any connectivity blips or messy workarounds. A corresponding patch for wg extends the existing `wg set` interface to leverage this feature. $ wg set wg0 peer <PUBKEY> allowed-ips +192.168.88.0/24,-192.168.0.1/32 When '+' or '-' is prepended to any ip in the list, wg clears WGPEER_F_REPLACE_ALLOWEDIPS and sets the WGALLOWEDIP_F_REMOVE_ME flag on any ip prefixed with '-'. Signed-off-by: Jordan Rife <jordan@jrife.io> [Jason: minor style nits, fixes to selftest, bump of wireguard-tools version] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-5-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-27wireguard: selftests: cleanup CONFIG_UBSAN_SANITIZE_ALLWangYuli
Commit 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL") removed the CONFIG_UBSAN_SANITIZE_ALL configuration option. Eliminate invalid configurations to improve code readability. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://patch.msgid.link/20250521212707.1767879-2-Jason@zx2c4.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-05-26Merge tag 'perf-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: "Core & generic-arch updates: - Add support for dynamic constraints and propagate it to the Intel driver (Kan Liang) - Fix & enhance driver-specific throttling support (Kan Liang) - Record sample last_period before updating on the x86 and PowerPC platforms (Mark Barnett) - Make perf_pmu_unregister() usable (Peter Zijlstra) - Unify perf_event_free_task() / perf_event_exit_task_context() (Peter Zijlstra) - Simplify perf_event_release_kernel() and perf_event_free_task() (Peter Zijlstra) - Allocate non-contiguous AUX pages by default (Yabin Cui) Uprobes updates: - Add support to emulate NOP instructions (Jiri Olsa) - selftests/bpf: Add 5-byte NOP uprobe trigger benchmark (Jiri Olsa) x86 Intel PMU enhancements: - Support Intel Auto Counter Reload [ACR] (Kan Liang) - Add PMU support for Clearwater Forest (Dapeng Mi) - Arch-PEBS preparatory changes: (Dapeng Mi) - Parse CPUID archPerfmonExt leaves for non-hybrid CPUs - Decouple BTS initialization from PEBS initialization - Introduce pairs of PEBS static calls x86 AMD PMU enhancements: - Use hrtimer for handling overflows in the AMD uncore driver (Sandipan Das) - Prevent UMC counters from saturating (Sandipan Das) Fixes and cleanups: - Fix put_ctx() ordering (Frederic Weisbecker) - Fix irq work dereferencing garbage (Frederic Weisbecker) - Misc fixes and cleanups (Changbin Du, Frederic Weisbecker, Ian Rogers, Ingo Molnar, Kan Liang, Peter Zijlstra, Qing Wang, Sandipan Das, Thorsten Blum)" * tag 'perf-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) perf/headers: Clean up <linux/perf_event.h> a bit perf/uapi: Clean up <uapi/linux/perf_event.h> a bit perf/uapi: Fix PERF_RECORD_SAMPLE comments in <uapi/linux/perf_event.h> mips/perf: Remove driver-specific throttle support xtensa/perf: Remove driver-specific throttle support sparc/perf: Remove driver-specific throttle support loongarch/perf: Remove driver-specific throttle support csky/perf: Remove driver-specific throttle support arc/perf: Remove driver-specific throttle support alpha/perf: Remove driver-specific throttle support perf/apple_m1: Remove driver-specific throttle support perf/arm: Remove driver-specific throttle support s390/perf: Remove driver-specific throttle support powerpc/perf: Remove driver-specific throttle support perf/x86/zhaoxin: Remove driver-specific throttle support perf/x86/amd: Remove driver-specific throttle support perf/x86/intel: Remove driver-specific throttle support perf: Only dump the throttle log for the leader perf: Fix the throttle logic for a group perf/core: Add the is_event_in_freq_mode() helper to simplify the code ...
2025-05-26Merge tag 'objtool-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - Speed up SHT_GROUP reindexing (Josh Poimboeuf) - Fix up st_info in COMDAT group section (Rong Xu) * tag 'objtool-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Speed up SHT_GROUP reindexing objtool: Fix up st_info in COMDAT group section
2025-05-26Merge tag 'locking-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Futexes: - Add support for task local hash maps (Sebastian Andrzej Siewior, Peter Zijlstra) - Implement the FUTEX2_NUMA ABI, which feature extends the futex interface to be NUMA-aware. On NUMA-aware futexes a second u32 word containing the NUMA node is added to after the u32 futex value word (Peter Zijlstra) - Implement the FUTEX2_MPOL ABI, which feature extends the futex interface to be mempolicy-aware as well, to further refine futex node mappings and lookups (Peter Zijlstra) Locking primitives: - Misc cleanups (Andy Shevchenko, Borislav Petkov, Colin Ian King, Ingo Molnar, Nam Cao, Peter Zijlstra) Lockdep: - Prevent abuse of lockdep subclasses (Waiman Long) - Add number of dynamic keys to /proc/lockdep_stats (Waiman Long) Plus misc cleanups and fixes" * tag 'locking-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) selftests/futex: Fix spelling mistake "unitiliazed" -> "uninitialized" futex: Correct the kernedoc return value for futex_wait_setup(). tools headers: Synchronize prctl.h ABI header futex: Use RCU_INIT_POINTER() in futex_mm_init(). selftests/futex: Use TAP output in futex_numa_mpol selftests/futex: Use TAP output in futex_priv_hash futex: Fix kernel-doc comments futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init() futex: Fix outdated comment in struct restart_block locking/lockdep: Add number of dynamic keys to /proc/lockdep_stats locking/lockdep: Prevent abuse of lockdep subclass locking/lockdep: Move hlock_equal() to the respective #ifdeffery futex,selftests: Add another FUTEX2_NUMA selftest selftests/futex: Add futex_numa_mpol selftests/futex: Add futex_priv_hash selftests/futex: Build without headers nonsense tools/perf: Allow to select the number of hash buckets tools headers: Synchronize prctl.h ABI header futex: Implement FUTEX2_MPOL futex: Implement FUTEX2_NUMA ...
2025-05-26Merge tag 'linux_kselftest-kunit-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit updates from Shuah Khan: - Enable qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and 64-bit LE - Enable CONFIG_SPARC32 to clearly differentiate between sparc 32-bit and 64-bit configurations - Enable CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc LE and BE configurations - Add feature to list available architectures to kunit tool - Fixes to bugs and changes to documentation * tag 'linux_kselftest-kunit-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: Fix wrong parameter to kunit_deactivate_static_stub() kunit: tool: add test counts to JSON output Documentation: kunit: improve example on testing static functions kunit: executor: Remove const from kunit_filter_suites() allocation type kunit: qemu_configs: Disable faulting tests on 32-bit SPARC kunit: qemu_configs: Add 64-bit SPARC configuration kunit: qemu_configs: sparc: Explicitly enable CONFIG_SPARC32=y kunit: qemu_configs: Add PowerPC 32-bit BE and 64-bit LE kunit: qemu_configs: powerpc: Explicitly enable CONFIG_CPU_BIG_ENDIAN=y kunit: tool: Implement listing of available architectures kunit: qemu_configs: Add riscv32 config kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests
2025-05-26Merge tag 'linux_kselftest-next-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest updates from Shuah Khan: "Fixes: - cpufreq test to not double suspend in rtcwake case - compile error in pid_namespace test - run_kselftest.sh to use readlink if realpath is not available - cpufreq basic read and update testcases - ftrace to add poll to a gen_file so test can find it at run-time - spelling errors in perf_events test" * tag 'linux_kselftest-next-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/run_kselftest.sh: Use readlink if realpath is not available selftests/timens: timerfd: Use correct clockid type in tclock_gettime() selftests/timens: Make run_tests() functions static selftests/timens: Print TAP headers selftests: pid_namespace: add missing sys/mount.h include in pid_max.c kselftest: cpufreq: Get rid of double suspend in rtcwake case selftests/cpufreq: Fix cpufreq basic read and update testcases selftests/ftrace: Convert poll to a gen_file selftests/perf_events: Fix spelling mistake "sycnhronize" -> "synchronize"
2025-05-26Merge tag 'next.2025.05.17a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Joel Fernandes: - Removed swake_up_one_online() workaround - Reverted an incorrect rcuog wake-up fix from offline softirq - Rust RCU Guard methods marked as inline - Updated MAINTAINERS with Joel’s and Zqiang's new email address - Replaced magic constant in rcu_seq_done_exact() with named constant - Added warning mechanism to validate rcu_seq_done_exact() - Switched SRCU polling API to use rcu_seq_done_exact() - Commented on redundant delta check in rcu_seq_done_exact() - Made ->gpwrap tests in rcutorture more frequent - Fixed reuse of ARM64 images in rcutorture - rcutorture improved to check Kconfig and reader conflict handling - Extracted logic from rcu_torture_one_read() for clarity - Updated LWN RCU API documentation links - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT - Added tests for SRCU up/down reader primitives - Added comments and delays checks in rcutorture - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch - Added --do-normal and --do-no-normal to torture.sh - Added RCU Rust binding tests to torture.sh - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01 - Replaced kmalloc() with kcalloc() in rcuscale - Refined listRCU example code for stale data elimination - Fixed hardirq count bug for x86 in cpu_stall_cputime - Added safety checks in rcu/nocb for offloaded rdp access - Other miscellaneous changes * tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (27 commits) rcutorture: Fix issue with re-using old images on ARM64 rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01 rcutorture: Reduce TREE01 CPU overcommit torture: Check for "Call trace:" as well as "Call Trace:" rcutorture: Perform more frequent testing of ->gpwrap torture: Add testing of RCU's Rust bindings to torture.sh torture: Add --do-{,no-}normal to torture.sh checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite() rcutorture: Comment invocations of tick_dep_set_task() rcu/nocb: Add Safe checks for access offloaded rdp rcuscale: using kcalloc() to relpace kmalloc() doc/RCU/listRCU: refine example code for eliminating stale data doc: Update LWN RCU API links in whatisRCU.rst Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" rust: sync: rcu: Mark Guard methods as inline rcu/cpu_stall_cputime: fix the hardirq count for x86 architecture rcu: Remove swake_up_one_online() bandaid MAINTAINERS: Update Zqiang's email address rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT srcu: Use rcu_seq_done_exact() for polling API ...
2025-05-26Merge tag 'v6.16-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Fix memcpy_sglist to handle partially overlapping SG lists - Use memcpy_sglist to replace null skcipher - Rename CRYPTO_TESTS to CRYPTO_BENCHMARK - Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS - Hide CRYPTO_MANAGER - Add delayed freeing of driver crypto_alg structures Compression: - Allocate large buffers on first use instead of initialisation in scomp - Drop destination linearisation buffer in scomp - Move scomp stream allocation into acomp - Add acomp scatter-gather walker - Remove request chaining - Add optional async request allocation Hashing: - Remove request chaining - Add optional async request allocation - Move partial block handling into API - Add ahash support to hmac - Fix shash documentation to disallow usage in hard IRQs Algorithms: - Remove unnecessary SIMD fallback code on x86 and arm/arm64 - Drop avx10_256 xts(aes)/ctr(aes) on x86 - Improve avx-512 optimisations for xts(aes) - Move chacha arch implementations into lib/crypto - Move poly1305 into lib/crypto and drop unused Crypto API algorithm - Disable powerpc/poly1305 as it has no SIMD fallback - Move sha256 arch implementations into lib/crypto - Convert deflate to acomp - Set block size correctly in cbcmac Drivers: - Do not use sg_dma_len before mapping in sun8i-ss - Fix warm-reboot failure by making shutdown do more work in qat - Add locking in zynqmp-sha - Remove cavium/zip - Add support for PCI device 0x17D8 to ccp - Add qat_6xxx support in qat - Add support for RK3576 in rockchip-rng - Add support for i.MX8QM in caam Others: - Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up - Add new SEV/SNP platform shutdown API in ccp" * tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits) x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining crypto: qat - add missing header inclusion crypto: api - Redo lookup on EEXIST Revert "crypto: testmgr - Add hash export format testing" crypto: marvell/cesa - Do not chain submitted requests crypto: powerpc/poly1305 - add depends on BROKEN for now Revert "crypto: powerpc/poly1305 - Add SIMD fallback" crypto: ccp - Add missing tee info reg for teev2 crypto: ccp - Add missing bootloader info reg for pspv5 crypto: sun8i-ce - move fallback ahash_request to the end of the struct crypto: octeontx2 - Use dynamic allocated memory region for lmtst crypto: octeontx2 - Initialize cptlfs device info once crypto: xts - Only add ecb if it is not already there crypto: lrw - Only add ecb if it is not already there crypto: testmgr - Add hash export format testing crypto: testmgr - Use ahash for generic tfm crypto: hmac - Add ahash support crypto: testmgr - Ignore EEXIST on shash allocation crypto: algapi - Add driver template support to crypto_inst_setname crypto: shash - Set reqsize in shash_alg ...
2025-05-26Merge tag 'kvm-riscv-6.16-1' of https://github.com/kvm-riscv/linux into HEADPaolo Bonzini
KVM/riscv changes for 6.16 - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl
2025-05-26Merge tag 'kvmarm-6.16' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.16 * New features: - Add large stage-2 mapping support for non-protected pKVM guests, clawing back some performance. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Enable nested virtualisation support on systems that support it (yes, it has been a long time coming), though it is disabled by default. * Improvements, fixes and cleanups: - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. This ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups.
2025-05-26Merge branch 'pm-tools'Rafael J. Wysocki
Merge a cpupower utility update for 6.16-rc1 that adds a systemd service to run cpupower and changes binding's Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli). * pm-tools: cpupower: do not install files to /etc/default/ cpupower: do not call systemctl at install time cpupower: do not write DESTDIR to cpupower.service cpupower: change binding's makefile to use -lcpupower cpupower: add a systemd service to run cpupower
2025-05-26Merge branches 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki
Merge updates related to system sleep handling and runtime PM for 6.16-rc1: - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla). - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki). - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás). - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum). - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki). - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel). - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang). - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu). - Add configurable pm_test delay for hibernation (Zihuan Zhang). - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter). - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki). * pm-runtime: PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() PM: sysfs: Move debug runtime PM attributes to runtime_attrs[] PM: runtime: Add new devm functions * pm-sleep: PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: sleep: Introduce pm_sleep_transition_in_progress() PM: sleep: Introduce pm_suspend_in_progress() PM: sleep: Print PM debug messages during hibernation ucsi_ccg: Disable async suspend in ucsi_ccg_probe() PM: hibernate: add configurable delay for pm_test PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks() PM: wakeup: Add missing wakeup source attribute relax_count PM: sleep: Remove unnecessary !! PM: sleep: Use two lines for "Restarting..." / "done" messages PM: sleep: Make suspend of devices more asynchronous PM: sleep: Suspend async parents after suspending children PM: sleep: Resume children after resuming the parent PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...
2025-05-26Merge tag 'vfs-6.16-rc1.selftests' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs selftests updates from Christian Brauner: "This contains various cleanups, fixes, and extensions for out filesystem selftests" * tag 'vfs-6.16-rc1.selftests' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/fs/mount-notify: add a test variant running inside userns selftests/filesystems: create setup_userns() helper selftests/filesystems: create get_unique_mnt_id() helper selftests/fs/mount-notify: build with tools include dir selftests/mount_settattr: remove duplicate syscall definitions selftests/pidfd: move syscall definitions into wrappers.h selftests/fs/statmount: build with tools include dir selftests/filesystems: move wrapper.h out of overlayfs subdir selftests/mount_settattr: ensure that ext4 filesystem can be created selftests/mount_settattr: add missing STATX_MNT_ID_UNIQUE define selftests/mount_settattr: don't define sys_open_tree() twice
2025-05-26Merge tag 'vfs-6.16-rc1.coredump' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull coredump updates from Christian Brauner: "This adds support for sending coredumps over an AF_UNIX socket. It also makes (implicit) use of the new SO_PEERPIDFD ability to hand out pidfds for reaped peer tasks The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a saf way to handle them instead of relying on super privileged coredumping helpers This will also be significantly more lightweight since the kernel doens't have to do a fork()+exec() for each crashing process to spawn a usermodehelper. Instead the kernel just connects to the AF_UNIX socket and userspace can process it concurrently however it sees fit. Support for userspace is incoming starting with systemd-coredump There's more work coming in that direction next cycle. The rest below goes into some details and background Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump In the example the core_pattern shown causes the kernel to spawn systemd-coredump as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant) - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe - A new usermode helper has to be spawned for each crashing process This adds a new mode: (3) Dumping into an AF_UNIX socket Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket: - The coredump server uses SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. That is a huge attack vector right now - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX socket directly - The pidfd for the crashing task will contain information how the task coredumps. The PIDFD_GET_INFO ioctl gained a new flag PIDFD_INFO_COREDUMP which can be used to retreive the coredump information If the coredump gets a new coredump client connection the kernel guarantees that PIDFD_INFO_COREDUMP information is available. Currently the following information is provided in the new @coredump_mask extension to struct pidfd_info: * PIDFD_COREDUMPED is raised if the task did actually coredump * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable) * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict access to the generated coredump to sufficiently privileged users" * tag 'vfs-6.16-rc1.coredump' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: mips, net: ensure that SOCK_COREDUMP is defined selftests/coredump: add tests for AF_UNIX coredumps selftests/pidfd: add PIDFD_INFO_COREDUMP infrastructure coredump: validate socket name as it is written coredump: show supported coredump modes pidfs, coredump: add PIDFD_INFO_COREDUMP coredump: add coredump socket coredump: reflow dump helpers a little coredump: massage do_coredump() coredump: massage format_corename()
2025-05-26Merge tag 'vfs-6.16-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: "Features: - Allow handing out pidfds for reaped tasks for AF_UNIX SO_PEERPIDFD socket option SO_PEERPIDFD is a socket option that allows to retrieve a pidfd for the process that called connect() or listen(). This is heavily used to safely authenticate clients in userspace avoiding security bugs due to pid recycling races (dbus, polkit, systemd, etc.) SO_PEERPIDFD currently doesn't support handing out pidfds if the sk->sk_peer_pid thread-group leader has already been reaped. In this case it currently returns EINVAL. Userspace still wants to get a pidfd for a reaped process to have a stable handle it can pass on. This is especially useful now that it is possible to retrieve exit information through a pidfd via the PIDFD_GET_INFO ioctl()'s PIDFD_INFO_EXIT flag Another summary has been provided by David Rheinsberg: > A pidfd can outlive the task it refers to, and thus user-space > must already be prepared that the task underlying a pidfd is > gone at the time they get their hands on the pidfd. For > instance, resolving the pidfd to a PID via the fdinfo must be > prepared to read `-1`. > > Despite user-space knowing that a pidfd might be stale, several > kernel APIs currently add another layer that checks for this. In > particular, SO_PEERPIDFD returns `EINVAL` if the peer-task was > already reaped, but returns a stale pidfd if the task is reaped > immediately after the respective alive-check. > > This has the unfortunate effect that user-space now has two ways > to check for the exact same scenario: A syscall might return > EINVAL/ESRCH/... *or* the pidfd might be stale, even though > there is no particular reason to distinguish both cases. This > also propagates through user-space APIs, which pass on pidfds. > They must be prepared to pass on `-1` *or* the pidfd, because > there is no guaranteed way to get a stale pidfd from the kernel. > > Userspace must already deal with a pidfd referring to a reaped > task as the task may exit and get reaped at any time will there > are still many pidfds referring to it In order to allow handing out reaped pidfd SO_PEERPIDFD needs to ensure that PIDFD_INFO_EXIT information is available whenever a pidfd for a reaped task is created by PIDFD_INFO_EXIT. The uapi promises that reaped pidfds are only handed out if it is guaranteed that the caller sees the exit information: TEST_F(pidfd_info, success_reaped) { struct pidfd_info info = { .mask = PIDFD_INFO_CGROUPID | PIDFD_INFO_EXIT, }; /* * Process has already been reaped and PIDFD_INFO_EXIT been set. * Verify that we can retrieve the exit status of the process. */ ASSERT_EQ(ioctl(self->child_pidfd4, PIDFD_GET_INFO, &info), 0); ASSERT_FALSE(!!(info.mask & PIDFD_INFO_CREDS)); ASSERT_TRUE(!!(info.mask & PIDFD_INFO_EXIT)); ASSERT_TRUE(WIFEXITED(info.exit_code)); ASSERT_EQ(WEXITSTATUS(info.exit_code), 0); } To hand out pidfds for reaped processes we thus allocate a pidfs entry for the relevant sk->sk_peer_pid at the time the sk->sk_peer_pid is stashed and drop it when the socket is destroyed. This guarantees that exit information will always be recorded for the sk->sk_peer_pid task and we can hand out pidfds for reaped processes - Hand a pidfd to the coredump usermode helper process Give userspace a way to instruct the kernel to install a pidfd for the crashing process into the process started as a usermode helper. There's still tricky race-windows that cannot be easily or sometimes not closed at all by userspace. There's various ways like looking at the start time of a process to make sure that the usermode helper process is started after the crashing process but it's all very very brittle and fraught with peril The crashed-but-not-reaped process can be killed by userspace before coredump processing programs like systemd-coredump have had time to manually open a PIDFD from the PID the kernel provides them, which means they can be tricked into reading from an arbitrary process, and they run with full privileges as they are usermode helper processes Even if that specific race-window wouldn't exist it's still the safest and cleanest way to let the kernel provide the pidfd directly instead of requiring userspace to do it manually. In parallel with this commit we already have systemd adding support for this in [1] When the usermode helper process is forked we install a pidfd file descriptor three into the usermode helper's file descriptor table so it's available to the exec'd program Since usermode helpers are either children of the system_unbound_wq workqueue or kthreadd we know that the file descriptor table is empty and can thus always use three as the file descriptor number Note, that we'll install a pidfd for the thread-group leader even if a subthread is calling do_coredump(). We know that task linkage hasn't been removed yet and even if this @current isn't the actual thread-group leader we know that the thread-group leader cannot be reaped until @current has exited - Allow telling when a task has not been found from finding the wrong task when creating a pidfd We currently report EINVAL whenever a struct pid has no tasked attached anymore thereby conflating two concepts: (1) The task has already been reaped (2) The caller requested a pidfd for a thread-group leader but the pid actually references a struct pid that isn't used as a thread-group leader This is causing issues for non-threaded workloads as in where they expect ESRCH to be reported, not EINVAL So allow userspace to reliably distinguish between (1) and (2) - Make it possible to detect when a pidfs entry would outlive the struct pid it pinned - Add a range of new selftests Cleanups: - Remove unneeded NULL check from pidfd_prepare() for passed struct pid - Avoid pointless reference count bump during release_task() Fixes: - Various fixes to the pidfd and coredump selftests - Fix error handling for replace_fd() when spawning coredump usermode helper" * tag 'vfs-6.16-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: detect refcount bugs coredump: hand a pidfd to the usermode coredump helper coredump: fix error handling for replace_fd() pidfs: move O_RDWR into pidfs_alloc_file() selftests: coredump: Raise timeout to 2 minutes selftests: coredump: Fix test failure for slow machines selftests: coredump: Properly initialize pointer net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid pidfs: get rid of __pidfd_prepare() net, pidfs: prepare for handing out pidfds for reaped sk->sk_peer_pid pidfs: register pid in pidfs net, pidfd: report EINVAL for ESRCH release_task: kill the no longer needed get/put_pid(thread_pid) pidfs: ensure consistent ENOENT/ESRCH reporting exit: move wake_up_all() pidfd waiters into __unhash_process() selftest/pidfd: add test for thread-group leader pidfd open for thread pidfd: improve uapi when task isn't found pidfd: remove unneeded NULL check from pidfd_prepare() selftests/pidfd: adapt to recent changes
2025-05-26Merge tag 'nf-next-25-05-23' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next, specifically 26 patches: 5 patches adding/updating selftests, 4 fixes, 3 PREEMPT_RT fixes, and 14 patches to enhance nf_tables): 1) Improve selftest coverage for pipapo 4 bit group format, from Florian Westphal. 2) Fix incorrect dependencies when compiling a kernel without legacy ip{6}tables support, also from Florian. 3) Two patches to fix nft_fib vrf issues, including selftest updates to improve coverage, also from Florian Westphal. 4) Fix incorrect nesting in nft_tunnel's GENEVE support, from Fernando F. Mancera. 5) Three patches to fix PREEMPT_RT issues with nf_dup infrastructure and nft_inner to match in inner headers, from Sebastian Andrzej Siewior. 6) Integrate conntrack information into nft trace infrastructure, from Florian Westphal. 7) A series of 13 patches to allow to specify wildcard netdevice in netdev basechain and flowtables, eg. table netdev filter { chain ingress { type filter hook ingress devices = { eth0, eth1, vlan* } priority 0; policy accept; } } This also allows for runtime hook registration on NETDEV_{UN}REGISTER event, from Phil Sutter. netfilter pull request 25-05-23 * tag 'nf-next-25-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: (26 commits) selftests: netfilter: Torture nftables netdev hooks netfilter: nf_tables: Add notifications for hook changes netfilter: nf_tables: Support wildcard netdev hook specs netfilter: nf_tables: Sort labels in nft_netdev_hook_alloc() netfilter: nf_tables: Handle NETDEV_CHANGENAME events netfilter: nf_tables: Wrap netdev notifiers netfilter: nf_tables: Respect NETDEV_REGISTER events netfilter: nf_tables: Prepare for handling NETDEV_REGISTER events netfilter: nf_tables: Have a list of nf_hook_ops in nft_hook netfilter: nf_tables: Pass nf_hook_ops to nft_unregister_flowtable_hook() netfilter: nf_tables: Introduce nft_register_flowtable_ops() netfilter: nf_tables: Introduce nft_hook_find_ops{,_rcu}() netfilter: nf_tables: Introduce functions freeing nft_hook objects netfilter: nf_tables: add packets conntrack state to debug trace info netfilter: conntrack: make nf_conntrack_id callable without a module dependency netfilter: nf_dup_netdev: Move the recursion counter struct netdev_xmit netfilter: nft_inner: Use nested-BH locking for nft_pcpu_tun_ctx netfilter: nf_dup{4, 6}: Move duplication check to task_struct netfilter: nft_tunnel: fix geneve_opt dump selftests: netfilter: nft_fib.sh: add type and oif tests with and without VRFs ... ==================== Link: https://patch.msgid.link/20250523132712.458507-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26Merge branch 'acpica'Rafael J. Wysocki
Merge ACPICA updates, including two upstream releases 20241212 and 20250404, for 6.16-rc1: - Fix two ACPICA SLAB cache leaks (Seunghun Han). - Add EINJv2 get error type action and define Error Injection Actions in hex values to avoid inconsistencies between the specification and the code (Zaid Alali). - Fix typo in comments for SRAT structures (Adam Lackorzynski). - Prevent possible loss of data in ACPICA because of u32 to u8 conversions (Saket Dumbre). - Fix reading FFixedHW operation regions in ACPICA (Daniil Tatianin). - Add support for printing AML arguments when the ACPICA debug level is ACPI_LV_TRACE_POINT (Mario Limonciello). - Drop a stale comment about the file content from actbl2.h (Sudeep Holla). - Apply pack(1) to union aml_resource (Tamir Duberstein). - Fix overflow check in the ACPICA version of vsnprintf() (gldrk). - Interpret SIDP structures in DMAR added revision 3.4 of the VT-d specification (Alexey Neyman). - Add typedef and other definitions related to MRRM to ACPICA (Tony Luck). - Add definitions for RIMT to ACPICA (Sunil V L). - Fix spelling mistake "Incremement" -> "Increment" in the ACPICA utilities code (Colin Ian King). - Add typedef and other definitions for ERDT to ACPICA (Tony Luck). - Introduce ACPI_NONSTRING and use it (Kees Cook, Ahmed Salem). - Rename structure and field names of the RAS2 table in actbl2.h (Shiju Jose). - Fix up whitespace in acpica/utcache.c (Zhe Qiao). - Avoid sequence overread in a call to strncmp() in ap_get_table_length() and replace strncpy() with memcpy() in ACPICA in some places (Ahmed Salem). - Update copyright year in all ACPICA files (Saket Dumbre). * acpica: (30 commits) ACPICA: Update copyright year ACPICA: Logfile: Changes for version 20250404 ACPICA: Replace strncpy() with memcpy() ACPICA: Apply ACPI_NONSTRING in more places ACPICA: Avoid sequence overread in call to strncmp() ACPICA: Adjust the position of code lines ACPICA: actbl2.h: ACPI 6.5: RAS2: Rename structure and field names of the RAS2 table ACPICA: Apply ACPI_NONSTRING ACPICA: Introduce ACPI_NONSTRING ACPICA: actbl2.h: ERDT: Add typedef and other definitions ACPICA: infrastructure: Add new DMT_BUF types and shorten a long name ACPICA: Utilities: Fix spelling mistake "Incremement" -> "Increment" ACPICA: MRRM: Some cleanups ACPICA: actbl2: Add definitions for RIMT ACPICA: actbl2.h: MRRM: Add typedef and other definitions ACPICA: infrastructure: Add new header and ACPI_DMT_BUF26 types ACPICA: Interpret SIDP structures in DMAR ACPICA: utilities: Fix overflow check in vsnprintf() ACPICA: Apply pack(1) to union aml_resource ACPICA: Drop stale comment about the header file content ...
2025-05-26Merge tag 'linux-can-next-for-6.16-20250522' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-05-22 this is a pull request of 22 patches for net-next/main. The series by Biju Das contains 19 patches and adds RZ/G3E CANFD support to the rcar_canfd driver. The patch by Vincent Mailhol adds a struct data_bittiming_params to group FD parameters as a preparation patch for CAN-XL support. Felix Maurer's patch imports tst-filter from can-tests into the kernel self tests and Vincent Mailhol adds support for physical CAN interfaces. linux-can-next-for-6.16-20250522 * tag 'linux-can-next-for-6.16-20250522' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits) selftests: can: test_raw_filter.sh: add support of physical interfaces selftests: can: Import tst-filter from can-tests can: dev: add struct data_bittiming_params to group FD parameters can: rcar_canfd: Add RZ/G3E support can: rcar_canfd: Enhance multi_channel_irqs handling can: rcar_canfd: Add external_clk variable to struct rcar_canfd_hw_info can: rcar_canfd: Add sh variable to struct rcar_canfd_hw_info can: rcar_canfd: Add struct rcanfd_regs variable to struct rcar_canfd_hw_info can: rcar_canfd: Add shared_can_regs variable to struct rcar_canfd_hw_info can: rcar_canfd: Add ch_interface_mode variable to struct rcar_canfd_hw_info can: rcar_canfd: Add {nom,data}_bittiming variables to struct rcar_canfd_hw_info can: rcar_canfd: Add max_cftml variable to struct rcar_canfd_hw_info can: rcar_canfd: Add max_aflpn variable to struct rcar_canfd_hw_info can: rcar_canfd: Add rnc_field_width variable to struct rcar_canfd_hw_info can: rcar_canfd: Update RCANFD_GAFLCFG macro can: rcar_canfd: Add rcar_canfd_setrnc() can: rcar_canfd: Drop the mask operation in RCANFD_GAFLCFG_SETRNC macro can: rcar_canfd: Update RCANFD_GERFL_ERR macro can: rcar_canfd: Drop RCANFD_GAFLCFG_GETRNC macro can: rcar_canfd: Use of_get_available_child_by_name() ... ==================== Link: https://patch.msgid.link/20250522084128.501049-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-26Merge tag 'vfs-6.16-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()|info_plog()|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...
2025-05-26selftests: ncdevmem: add tx test with multiple IOVsStanislav Fomichev
Use prime 3 for length to make offset slowly drift away. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Acked-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-26selftests: ncdevmem: make chunking optionalStanislav Fomichev
Add new -z argument to specify max IOV size. By default, use single large IOV. Signed-off-by: Stanislav Fomichev <stfomichev@gmail.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>