summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-05-21net: add skb_crc32c()Eric Biggers
Add skb_crc32c(), which calculates the CRC32C of a sk_buff. It will replace __skb_checksum(), which unnecessarily supports arbitrary checksums. Compared to __skb_checksum(), skb_crc32c(): - Uses the correct type for CRC32C values (u32, not __wsum). - Does not require the caller to provide a skb_checksum_ops struct. - Is faster because it does not use indirect calls and does not use the very slow crc32c_combine(). According to commit 2817a336d4d5 ("net: skb_checksum: allow custom update/combine for walking skb") which added __skb_checksum(), the original motivation for the abstraction layer was to avoid code duplication for CRC32C and other checksums in the future. However: - No additional checksums showed up after CRC32C. __skb_checksum() is only used with the "regular" net checksum and CRC32C. - Indirect calls are expensive. Commit 2544af0344ba ("net: avoid indirect calls in L4 checksum calculation") worked around this using the INDIRECT_CALL_1 macro. But that only avoided the indirect call for the net checksum, and at the cost of an extra branch. - The checksums use different types (__wsum and u32), causing casts to be needed. - It made the checksums of fragments be combined (rather than chained) for both checksums, despite this being highly counterproductive for CRC32C due to how slow crc32c_combine() is. This can clearly be seen in commit 4c2f24549644 ("sctp: linearize early if it's not GSO") which tried to work around this performance bug. With a dedicated function for each checksum, we can instead just use the proper strategy for each checksum. As shown by the following tables, the new function skb_crc32c() is faster than __skb_checksum(), with the improvement varying greatly from 5% to 2500% depending on the case. The largest improvements come from fragmented packets, mainly due to eliminating the inefficient crc32c_combine(). But linear packets are improved too, especially shorter ones, mainly due to eliminating indirect calls. These benchmarks were done on AMD Zen 5. On that CPU, Linux uses IBRS instead of retpoline; an even greater improvement might be seen with retpoline: Linear sk_buffs Length in bytes __skb_checksum cycles skb_crc32c cycles =============== ===================== ================= 64 43 18 256 94 77 1420 204 161 16384 1735 1642 Nonlinear sk_buffs (even split between head and one fragment) Length in bytes __skb_checksum cycles skb_crc32c cycles =============== ===================== ================= 64 579 22 256 829 77 1420 1506 194 16384 4365 1682 Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-21Merge tag 'samsung-drivers-6.16-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers Samsung SoC drivers for v6.16, part two Add CPU hotplug support on Google GS101 by toggling respective bits in secondary PMU intr block (Power Management Unit (PMU) Interrupt Generation) from the main PMU driver. * tag 'samsung-drivers-6.16-2' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: soc: samsung: exynos-pmu: enable CPU hotplug support for gs101 MAINTAINERS: Add google,gs101-pmu-intr-gen.yaml binding file dt-bindings: soc: samsung: exynos-pmu: gs101: add google,pmu-intr-gen phandle dt-bindings: soc: google: Add gs101-pmu-intr-gen binding documentation Link: https://lore.kernel.org/r/20250516082037.7248-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21PCI/MSI: Use bool for MSI enable state trackingHans Zhang
Convert pci_msi_enable and pci_msi_enabled() to use bool type for clarity. No functional changes, only code cleanup. Signed-off-by: Hans Zhang <hans.zhang@cixtech.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250516165223.125083-2-18255117159@163.com
2025-05-21crash_dump: retrieve dm crypt keys in kdump kernelCoiby Xu
Crash kernel will retrieve the dm crypt keys based on the dmcryptkeys command line parameter. When user space writes the key description to /sys/kernel/config/crash_dm_crypt_key/restore, the crash kernel will save the encryption keys to the user keyring. Then user space e.g. cryptsetup's --volume-key-keyring API can use it to unlock the encrypted device. Link: https://lkml.kernel.org/r/20250502011246.99238-6-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: store dm crypt keys in kdump reserved memoryCoiby Xu
When the kdump kernel image and initrd are loaded, the dm crypts keys will be read from keyring and then stored in kdump reserved memory. Assume a key won't exceed 256 bytes thus MAX_KEY_SIZE=256 according to "cryptsetup benchmark". Link: https://lkml.kernel.org/r/20250502011246.99238-4-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21kexec_file: allow to place kexec_buf randomlyCoiby Xu
Patch series "Support kdump with LUKS encryption by reusing LUKS volume keys", v9. LUKS is the standard for Linux disk encryption, widely adopted by users, and in some cases, such as Confidential VMs, it is a requirement. With kdump enabled, when the first kernel crashes, the system can boot into the kdump/crash kernel to dump the memory image (i.e., /proc/vmcore) to a specified target. However, there are two challenges when dumping vmcore to a LUKS-encrypted device: - Kdump kernel may not be able to decrypt the LUKS partition. For some machines, a system administrator may not have a chance to enter the password to decrypt the device in kdump initramfs after the 1st kernel crashes; For cloud confidential VMs, depending on the policy the kdump kernel may not be able to unseal the keys with TPM and the console virtual keyboard is untrusted. - LUKS2 by default use the memory-hard Argon2 key derivation function which is quite memory-consuming compared to the limited memory reserved for kdump. Take Fedora example, by default, only 256M is reserved for systems having memory between 4G-64G. With LUKS enabled, ~1300M needs to be reserved for kdump. Note if the memory reserved for kdump can't be used by 1st kernel i.e. an user sees ~1300M memory missing in the 1st kernel. Besides users (at least for Fedora) usually expect kdump to work out of the box i.e. no manual password input or custom crashkernel value is needed. And it doesn't make sense to derivate the keys again in kdump kernel which seems to be redundant work. This patchset addresses the above issues by making the LUKS volume keys persistent for kdump kernel with the help of cryptsetup's new APIs (--link-vk-to-keyring/--volume-key-keyring). Here is the life cycle of the kdump copies of LUKS volume keys, 1. After the 1st kernel loads the initramfs during boot, systemd use an user-input passphrase to de-crypt the LUKS volume keys or TPM-sealed key and then save the volume keys to specified keyring (using the --link-vk-to-keyring API) and the key will expire within specified time. 2. A user space tool (kdump initramfs loader like kdump-utils) create key items inside /sys/kernel/config/crash_dm_crypt_keys to inform the 1st kernel which keys are needed. 3. When the kdump initramfs is loaded by the kexec_file_load syscall, the 1st kernel will iterate created key items, save the keys to kdump reserved memory. 4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump initramfs asks the kdump kernel to create a user key using the key stored in kdump reserved memory by writing yes to /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted device is unlocked with libcryptsetup's --volume-key-keyring API. 5. The system gets rebooted to the 1st kernel after dumping vmcore to the LUKS encrypted device is finished After libcryptsetup saving the LUKS volume keys to specified keyring, whoever takes this should be responsible for the safety of these copies of keys. The keys will be saved in the memory area exclusively reserved for kdump where even the 1st kernel has no direct access. And further more, two additional protections are added, - save the copy randomly in kdump reserved memory as suggested by Jan - clear the _PAGE_PRESENT flag of the page that stores the copy as suggested by Pingfan This patchset only supports x86. There will be patches to support other architectures once this patch set gets merged. This patch (of 9): Currently, kexec_buf is placed in order which means for the same machine, the info in the kexec_buf is always located at the same position each time the machine is booted. This may cause a risk for sensitive information like LUKS volume key. Now struct kexec_buf has a new field random which indicates it's supposed to be placed in a random position. Note this feature is enabled only when CONFIG_CRASH_DUMP is enabled. So it only takes effect for kdump and won't impact kexec reboot. Link: https://lkml.kernel.org/r/20250502011246.99238-1-coxu@redhat.com Link: https://lkml.kernel.org/r/20250502011246.99238-2-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Suggested-by: Jan Pazdziora <jpazdziora@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21Merge tag 'qcom-drivers-for-6.16' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.16 Allow list QSEECOM for EFI variable services on on the Asus Zenbook A14, and block list TZMEM on the SM7150 platform to avoid issues with rmtfs. Extend the last-level cache (llcc) driver to support version 6 of the hardware and enable SM8750 support. Also add socinfo for the SM8750 platform. Re-enable UCSI support on SC8280XP, now that the reported crash has been dealt with, and filter the altmode notifications to avoid spurious hotplug events being propagated to user space. Add SM7150 support to pd-mapper. * tag 'qcom-drivers-for-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: soc: qcom: llcc-qcom: Add support for SM8750 soc: qcom: llcc-qcom: Add support for LLCC V6 dt-bindings: cache: qcom,llcc: Document SM8750 LLCC block soc: qcom: socinfo: add SM8750 SoC ID dt-bindings: arm: qcom,ids: add SoC ID for SM8750 dt-bindings: soc: qcom: qcom,rpm: add missing clock/-names properties dt-bindings: soc: qcom,rpm: add missing clock-controller node soc: qcom: smem: Update max processor count firmware: qcom: tzmem: disable sm7150 platform soc: qcom: pd-mapper: Add support for SM7150 soc: qcom: pmic_glink_altmode: fix spurious DP hotplug events soc: qcom: smp2p: Fix fallback to qcom,ipc parse soc: qcom: pmic_glink: enable UCSI on sc8280xp firmware: qcom: scm: Allow QSEECOM on Asus Zenbook A14 dt-bindings: soc: qcom,rpmh-rsc: Limit power-domains requirement Link: https://lore.kernel.org/r/20250513215656.44448-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21Merge tag 'reset-for-v6.16' of git://git.pengutronix.de/pza/linux into ↵Arnd Bergmann
soc/drivers Reset controller updates for v6.16 * Add T-HEAD TH1520 and Renesas RZ/V2H(P) USB2PHY reset controller drivers. * Add devm_reset_control_array_get_exclusive_released() variant to allow using the acquire/release hand-off mechanism for exclusive reset controls bundled into reset control arrays. * Add Sophgo SG2044 reset controller to device tree bindings. * tag 'reset-for-v6.16' of git://git.pengutronix.de/pza/linux: dt-bindings: reset: sophgo: Add SG2044 bindings. MAINTAINERS: Add entry for Renesas RZ/V2H(P) USB2PHY Port Reset driver reset: Add USB2PHY port reset driver for Renesas RZ/V2H(P) dt-bindings: reset: Document RZ/V2H(P) USB2PHY reset reset: Add devm_reset_control_array_get_exclusive_released() reset: thead: Add TH1520 reset controller driver dt-bindings: reset: Add T-HEAD TH1520 SoC Reset Controller Link: https://lore.kernel.org/r/20250513092516.3331585-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-05-21mm/mempolicy: Weighted Interleave Auto-tuningJoshua Hahn
On machines with multiple memory nodes, interleaving page allocations across nodes allows for better utilization of each node's bandwidth. Previous work by Gregory Price [1] introduced weighted interleave, which allowed for pages to be allocated across nodes according to user-set ratios. Ideally, these weights should be proportional to their bandwidth, so that under bandwidth pressure, each node uses its maximal efficient bandwidth and prevents latency from increasing exponentially. Previously, weighted interleave's default weights were just 1s -- which would be equivalent to the (unweighted) interleave mempolicy, which goes through the nodes in a round-robin fashion, ignoring bandwidth information. This patch has two main goals: First, it makes weighted interleave easier to use for users who wish to relieve bandwidth pressure when using nodes with varying bandwidth (CXL). By providing a set of "real" default weights that just work out of the box, users who might not have the capability (or wish to) perform experimentation to find the most optimal weights for their system can still take advantage of bandwidth-informed weighted interleave. Second, it allows for weighted interleave to dynamically adjust to hotplugged memory with new bandwidth information. Instead of manually updating node weights every time new bandwidth information is reported or taken off, weighted interleave adjusts and provides a new set of default weights for weighted interleave to use when there is a change in bandwidth information. To meet these goals, this patch introduces an auto-configuration mode for the interleave weights that provides a reasonable set of default weights, calculated using bandwidth data reported by the system. In auto mode, weights are dynamically adjusted based on whatever the current bandwidth information reports (and responds to hotplug events). This patch still supports users manually writing weights into the nodeN sysfs interface by entering into manual mode. When a user enters manual mode, the system stops dynamically updating any of the node weights, even during hotplug events that shift the optimal weight distribution. A new sysfs interface "auto" is introduced, which allows users to switch between the auto (writing 1 or Y) and manual (writing 0 or N) modes. The system also automatically enters manual mode when a nodeN interface is manually written to. There is one functional change that this patch makes to the existing weighted_interleave ABI: previously, writing 0 directly to a nodeN interface was said to reset the weight to the system default. Before this patch, the default for all weights were 1, which meant that writing 0 and 1 were functionally equivalent. With this patch, writing 0 is invalid. Link: https://lkml.kernel.org/r/20250520141236.2987309-1-joshua.hahnjy@gmail.com [joshua.hahnjy@gmail.com: wordsmithing changes, simplification, fixes] Link: https://lkml.kernel.org/r/20250511025840.2410154-1-joshua.hahnjy@gmail.com [joshua.hahnjy@gmail.com: remove auto_kobj_attr field from struct sysfs_wi_group] Link: https://lkml.kernel.org/r/20250512142511.3959833-1-joshua.hahnjy@gmail.com https://lore.kernel.org/linux-mm/20240202170238.90004-1-gregory.price@memverge.com/ [1] Link: https://lkml.kernel.org/r/20250505182328.4148265-1-joshua.hahnjy@gmail.com Co-developed-by: Gregory Price <gourry@gourry.net> Signed-off-by: Gregory Price <gourry@gourry.net> Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Suggested-by: Yunjeong Mun <yunjeong.mun@sk.com> Suggested-by: Oscar Salvador <osalvador@suse.de> Suggested-by: Ying Huang <ying.huang@linux.alibaba.com> Suggested-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Huang Ying <ying.huang@linux.alibaba.com> Reviewed-by: Honggyu Kim <honggyu.kim@sk.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21include: pe.h: Fix PE definitionsPali Rohár
* Rename constants to their standard PE names: - MZ_MAGIC -> IMAGE_DOS_SIGNATURE - PE_MAGIC -> IMAGE_NT_SIGNATURE - PE_OPT_MAGIC_PE32_ROM -> IMAGE_ROM_OPTIONAL_HDR_MAGIC - PE_OPT_MAGIC_PE32 -> IMAGE_NT_OPTIONAL_HDR32_MAGIC - PE_OPT_MAGIC_PE32PLUS -> IMAGE_NT_OPTIONAL_HDR64_MAGIC - IMAGE_DLL_CHARACTERISTICS_NX_COMPAT -> IMAGE_DLLCHARACTERISTICS_NX_COMPAT * Import constants and their description from readpe and file projects which contains current up-to-date information: - IMAGE_FILE_MACHINE_* - IMAGE_FILE_* - IMAGE_SUBSYSTEM_* - IMAGE_DLLCHARACTERISTICS_* - IMAGE_DLLCHARACTERISTICS_EX_* - IMAGE_DEBUG_TYPE_* * Add missing IMAGE_SCN_* constants and update their incorrect description * Fix incorrect value of IMAGE_SCN_MEM_PURGEABLE constant * Add description for win32_version and loader_flags PE fields Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-05-21can: dev: add struct data_bittiming_params to group FD parametersVincent Mailhol
This is a preparation patch for the introduction of CAN XL. CAN FD and CAN XL uses similar bittiming parameters. Add one level of nesting for all the CAN FD parameters. Typically: priv->can.data_bittiming; becomes: priv->can.fd.data_bittiming; This way, the CAN XL equivalent (to be introduced later) would be: priv->can.xl.data_bittiming; Add the new struct data_bittiming_params which contains all the data bittiming parameters, including the TDC and the callback functions. This done, update all the CAN FD drivers to make use of the new layout. Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Link: https://patch.msgid.link/20250501171213.2161572-2-mailhol.vincent@wanadoo.fr [mkl: fix rcar_canfd] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-21pidfs, coredump: add PIDFD_INFO_COREDUMPChristian Brauner
Extend the PIDFD_INFO_COREDUMP ioctl() with the new PIDFD_INFO_COREDUMP mask flag. This adds the @coredump_mask field to struct pidfd_info. When a task coredumps the kernel will provide the following information to userspace in @coredump_mask: * PIDFD_COREDUMPED is raised if the task did actually coredump. * PIDFD_COREDUMP_SKIP is raised if the task skipped coredumping (e.g., undumpable). * PIDFD_COREDUMP_USER is raised if this is a regular coredump and doesn't need special care by the coredump server. * PIDFD_COREDUMP_ROOT is raised if the generated coredump should be treated as sensitive and the coredump server should restrict to the generated coredump to sufficiently privileged users. The kernel guarantees that by the time the connection is made the all PIDFD_INFO_COREDUMP info is available. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-5-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21coredump: add coredump socketChristian Brauner
Coredumping currently supports two modes: (1) Dumping directly into a file somewhere on the filesystem. (2) Dumping into a pipe connected to a usermode helper process spawned as a child of the system_unbound_wq or kthreadd. For simplicity I'm mostly ignoring (1). There's probably still some users of (1) out there but processing coredumps in this way can be considered adventurous especially in the face of set*id binaries. The most common option should be (2) by now. It works by allowing userspace to put a string into /proc/sys/kernel/core_pattern like: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h The "|" at the beginning indicates to the kernel that a pipe must be used. The path following the pipe indicator is a path to a binary that will be spawned as a usermode helper process. Any additional parameters pass information about the task that is generating the coredump to the binary that processes the coredump. In the example core_pattern shown above systemd-coredump is spawned as a usermode helper. There's various conceptual consequences of this (non-exhaustive list): - systemd-coredump is spawned with file descriptor number 0 (stdin) connected to the read-end of the pipe. All other file descriptors are closed. That specifically includes 1 (stdout) and 2 (stderr). This has already caused bugs because userspace assumed that this cannot happen (Whether or not this is a sane assumption is irrelevant.). - systemd-coredump will be spawned as a child of system_unbound_wq. So it is not a child of any userspace process and specifically not a child of PID 1. It cannot be waited upon and is in a weird hybrid upcall which are difficult for userspace to control correctly. - systemd-coredump is spawned with full kernel privileges. This necessitates all kinds of weird privilege dropping excercises in userspace to make this safe. - A new usermode helper has to be spawned for each crashing process. This series adds a new mode: (3) Dumping into an AF_UNIX socket. Userspace can set /proc/sys/kernel/core_pattern to: @/path/to/coredump.socket The "@" at the beginning indicates to the kernel that an AF_UNIX coredump socket will be used to process coredumps. The coredump socket must be located in the initial mount namespace. When a task coredumps it opens a client socket in the initial network namespace and connects to the coredump socket. - The coredump server uses SO_PEERPIDFD to get a stable handle on the connected crashing task. The retrieved pidfd will provide a stable reference even if the crashing task gets SIGKILLed while generating the coredump. - By setting core_pipe_limit non-zero userspace can guarantee that the crashing task cannot be reaped behind it's back and thus process all necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to detect whether /proc/<pid> still refers to the same process. The core_pipe_limit isn't used to rate-limit connections to the socket. This can simply be done via AF_UNIX sockets directly. - The pidfd for the crashing task will grow new information how the task coredumps. - The coredump server should mark itself as non-dumpable. - A container coredump server in a separate network namespace can simply bind to another well-know address and systemd-coredump fowards coredumps to the container. - Coredumps could in the future also be handled via per-user/session coredump servers that run only with that users privileges. The coredump server listens on the coredump socket and accepts a new coredump connection. It then retrieves SO_PEERPIDFD for the client, inspects uid/gid and hands the accepted client to the users own coredump handler which runs with the users privileges only (It must of coure pay close attention to not forward crashing suid binaries.). The new coredump socket will allow userspace to not have to rely on usermode helpers for processing coredumps and provides a safer way to handle them instead of relying on super privileged coredumping helpers that have and continue to cause significant CVEs. This will also be significantly more lightweight since no fork()+exec() for the usermodehelper is required for each crashing process. The coredump server in userspace can e.g., just keep a worker pool. Link: https://lore.kernel.org/20250516-work-coredump-socket-v8-4-664f3caf2516@kernel.org Acked-by: Luca Boccassi <luca.boccassi@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-21futex: Use RCU_INIT_POINTER() in futex_mm_init().Sebastian Andrzej Siewior
There is no need for an explicit NULL pointer initialisation plus a comment why it is okay. RCU_INIT_POINTER() can be used for NULL initialisations and it is documented. This has been build tested with gcc version 9.3.0 (Debian 9.3.0-22) on a x86-64 defconfig. Fixes: 094ac8cff7858 ("futex: Relax the rcu_assign_pointer() assignment of mm->futex_phash in futex_mm_init()") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250517151455.1065363-4-bigeasy@linutronix.de
2025-05-21iommu/arm-smmu-qcom: Make set_stall work when the device is onConnor Abbott
Up until now we have only called the set_stall callback during initialization when the device is off. But we will soon start calling it to temporarily disable stall-on-fault when the device is on, so handle that by checking if the device is on and writing SCTLR. Signed-off-by: Connor Abbott <cwabbott0@gmail.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Link: https://lore.kernel.org/r/20250520-msm-gpu-fault-fixes-next-v8-3-fce6ee218787@gmail.com [will: Fix "mixed declarations and code" warning from sparse] Signed-off-by: Will Deacon <will@kernel.org>
2025-05-21Merge tag 'intel-gpio-v6.16-1' of ↵Bartosz Golaszewski
git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel into gpio/for-next intel-gpio for v6.16-1 * Split GPIO ACPI quirks to its own file * Refactored GPIO ACPI library to shrink the code The following is an automated git shortlog grouped by driver: gpiolib: - acpi: Update file references in the Documentation and MAINTAINERS - acpi: Move quirks to a separate file - acpi: Add acpi_gpio_need_run_edge_events_on_boot() getter - acpi: Handle deferred list via new API - acpi: Make sure we fill struct acpi_gpio_info - acpi: Switch to use enum in acpi_gpio_in_ignore_list() - acpi: Use temporary variable for struct acpi_gpio_info - acpi: Deduplicate some code in __acpi_find_gpio() - acpi: Reuse struct acpi_gpio_params in struct acpi_gpio_lookup - acpi: Rename par to params for better readability - acpi: Reduce memory footprint for struct acpi_gpio_params - acpi: Remove index parameter from acpi_gpio_property_lookup() - acpi: Improve struct acpi_gpio_info memory footprint
2025-05-21pinctrl: core: add devm_pinctrl_register_mappings()Thomas Richard
Using devm_pinctrl_register_mappings(), the core can automatically unregister pinctrl mappings. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Link: https://lore.kernel.org/20250520-aaeon-up-board-pinctrl-support-v6-3-dcb3756be3c6@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-05-21pinctrl: remove extern specifier for functions in machine.hThomas Richard
Extern is the default specifier for a function, no need to define it. Suggested-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Link: https://lore.kernel.org/20250520-aaeon-up-board-pinctrl-support-v6-2-dcb3756be3c6@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2025-05-21Merge tag 'v6.15-rc7' into x86/core, to pick up fixesIngo Molnar
Pick up build fixes from upstream to make this tree more testable. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-20highmem: add folio_test_partial_kmap()Matthew Wilcox (Oracle)
In commit c749d9b7ebbc ("iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP"), Hugh correctly noted that if KMAP_LOCAL_FORCE_MAP is enabled, we must limit ourselves to PAGE_SIZE bytes per call to kmap_local(). The same problem exists in memcpy_from_folio(), memcpy_to_folio(), folio_zero_tail(), folio_fill_tail() and memcpy_from_file_folio(), so add folio_test_partial_kmap() to do this more succinctly. Link: https://lkml.kernel.org/r/20250514170607.3000994-2-willy@infradead.org Fixes: 00cdf76012ab ("mm: add memcpy_from_file_folio()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-20mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=yFlorent Revest
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38. On configs with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by CONFIG_ARM64 when CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38. This bit being shared by two different VMA flags could lead to all sorts of unintended behaviors. Presumably, a process could maybe call into userfaultfd in a way that disables the shadow stack vma flag. I can't think of any attack where this would help (presumably, if an attacker tries to disable shadow stacks, they are trying to hijack control flow so can't arbitrarily call into userfaultfd yet anyway) but this still feels somewhat scary. Link: https://lkml.kernel.org/r/20250507131000.1204175-2-revest@chromium.org Fixes: ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS") Signed-off-by: Florent Revest <revest@chromium.org> Reviewed-by: Mark Brown <broonie@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Brendan Jackman <jackmanb@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Florent Revest <revest@chromium.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-20mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabledIgnacio Moreno Gonzalez
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps the mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if CONFIG_TRANSPARENT_HUGEPAGE is not defined. But in that case, the VM_NOHUGEPAGE does not make sense. I discovered this issue when trying to use the tool CRIU to checkpoint and restore a container. Our running kernel is compiled without CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps and saves the "nh" flag. When trying to restore the container, CRIU fails to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE always returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not defined. Link: https://lkml.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-v5-1-c6c38cfefd6e@kuka.com Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez@kuka.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-20net: phy: fixed_phy: constify status argument where possibleHeiner Kallweit
Constify the passed struct fixed_phy_status *status where possible. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/d1764b62-8538-408b-a4e3-b63715481a38@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20net: phy: fixed_phy: remove irq argument from fixed_phy_registerHeiner Kallweit
All callers pass PHY_POLL, therefore remove irq argument from fixed_phy_register(). Note: I keep the irq argument in fixed_phy_add_gpiod() for now, for the case that somebody may want to use a GPIO interrupt in the future, by e.g. adding a call to fwnode_irq_get() to fixed_phy_get_gpiod(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/31cdb232-a5e9-4997-a285-cb9a7d208124@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20net: phy: fixed_phy: remove irq argument from fixed_phy_addHeiner Kallweit
All callers pass PHY_POLL, therefore remove irq argument from fixed_phy_add(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Link: https://patch.msgid.link/b3b9b3bc-c310-4a54-b376-c909c83575de@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-20i2c: remove 'of_node' member from i2c_boardinfoWolfram Sang
There is no user of this member anymore. We can remove it. Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2025-05-20jbd2: remove journal_t argument from jbd2_chksum()Eric Biggers
Since jbd2_chksum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folioZhang Yi
jbd2_journal_blocks_per_page() returns the number of blocks in a single page. Rename it to jbd2_journal_blocks_per_folio() and make it returns the number of blocks in the largest folio, preparing for the calculation of journal credits blocks when allocating blocks within a large folio in the writeback path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()Rob Clark
In situations where mapping/unmapping sequence can be controlled by userspace, attempting to map over a region that has not yet been unmapped is an error. But not something that should spam dmesg. Now that there is a quirk, we can also drop the selftest_running flag, and use the quirk instead for selftests. Acked-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Rob Clark <robdclark@chromium.org> Link: https://lore.kernel.org/r/20250519175348.11924-6-robdclark@gmail.com [will: Rename quirk to IO_PGTABLE_QUIRK_NO_WARN per Robin's suggestion] Signed-off-by: Will Deacon <will@kernel.org>
2025-05-20net: phy: make mdio consumer / device layer a separate moduleHeiner Kallweit
After having factored out the provider part from mdio_bus.c, we can make the mdio consumer / device layer a separate module. This also allows to remove Kconfig symbol MDIO_DEVICE. The module init / exit functions from mdio_bus.c no longer have to be called from phy_device.c. The link order defined in drivers/net/phy/Makefile ensures that init / exit functions are called in the right order. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/dba6b156-5748-44ce-b5e2-e8dc2fcee5a7@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-20Add sound card support for QCS9100 and QCS9075Mark Brown
Merge series from Mohammad Rafi Shaik <mohammad.rafi.shaik@oss.qualcomm.com>: This patchset adds support for sound card on Qualcomm QCS9100 and QCS9075 boards.
2025-05-20dmapool: add NUMA affinity supportKeith Busch
Introduce dma_pool_create_node(), like dma_pool_create() but taking an additional NUMA node argument. Allocate struct dma_pool on the desired node, and store the node on dma_pool for allocating struct dma_page. Make dma_pool_create() an alias for dma_pool_create_node() with node set to NUMA_NO_NODE. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2025-05-19net: netlink: reduce extack cookie sizeJohannes Berg
Seems like the extack cookie hasn't found any users outside of wireless, which always uses nl_set_extack_cookie_u64(). Thus, allocating 20 bytes for it is pointless, reduce that to 8 bytes, and add a BUILD_BUG_ON() to ensure it's enough (obviously it is, for a u64, but in case it changes again.) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250516115927.38209-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-19fanotify: support watching filesystems and mounts inside usernsAmir Goldstein
An unprivileged user is allowed to create an fanotify group and add inode marks, but not filesystem, mntns and mount marks. Add limited support for setting up filesystem, mntns and mount marks by an unprivileged user under the following conditions: 1. User has CAP_SYS_ADMIN in the user ns where the group was created 2.a. User has CAP_SYS_ADMIN in the user ns where the sb was created OR (in case setting up a mntns mark) 2.b. User has CAP_SYS_ADMIN in the user ns associated with the mntns Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250516192803.838659-3-amir73il@gmail.com
2025-05-19PCI: Add Intel Wildcat Lake audio Device IDPeter Ujfalusi
Add Wildcat Lake (WCL) audio Device ID. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev> Acked-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250519080855.16977-2-peter.ujfalusi@linux.intel.com
2025-05-19cgroup: document the rstat per-cpu initializationJP Kobryn
The calls to css_rstat_init() occur at different places depending on the context. Document the conditions that determine which point of initialization is used. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: use subsystem-specific rstat locks to avoid contentionJP Kobryn
It is possible to eliminate contention between subsystems when updating/flushing stats by using subsystem-specific locks. Let the existing rstat locks be dedicated to the cgroup base stats and rename them to reflect that. Add similar locks to the cgroup_subsys struct for use with individual subsystems. Lock initialization is done in the new function ss_rstat_init(ss) which replaces cgroup_rstat_boot(void). If NULL is passed to this function, the global base stat locks will be initialized. Otherwise, the subsystem locks will be initialized. Change the existing lock helper functions to accept a reference to a css. Then within these functions, conditionally select the appropriate locks based on the subsystem affiliation of the given css. Add helper functions for this selection routine to avoid repeated code. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: use separate rstat trees for each subsystemJP Kobryn
Different subsystems may call cgroup_rstat_updated() within the same cgroup, resulting in a tree of pending updates from multiple subsystems. When one of these subsystems is flushed via cgroup_rstat_flushed(), all other subsystems with pending updates on the tree will also be flushed. Change the paradigm of having a single rstat tree for all subsystems to having separate trees for each subsystem. This separation allows for subsystems to perform flushes without the side effects of other subsystems. As an example, flushing the cpu stats will no longer cause the memory stats to be flushed and vice versa. In order to achieve subsystem-specific trees, change the tree node type from cgroup to cgroup_subsys_state pointer. Then remove those pointers from the cgroup and instead place them on the css. Finally, change update/flush functions to make use of the different node type (css). These changes allow a specific subsystem to be associated with an update or flush. Separate rstat trees will now exist for each unique subsystem. Since updating/flushing will now be done at the subsystem level, there is no longer a need to keep track of updated css nodes at the cgroup level. The list management of these nodes done within the cgroup (rstat_css_list and related) has been removed accordingly. Conditional guards for checking validity of a given css were placed within css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc usage in bpf programs. Guards were also placed within css_rstat_init/exit() in order to help consolidate calls to them. At call sites for all four functions, the existing guards were removed. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19i2c: smbus: introduce Write Disable-aware SPD instantiating functionsYo-Jung (Leo) Lin
Some SMBus controllers may restrict writes to addresses where SPD sensors may reside. This may lead to some SPD sensors not functioning correctly, and might need extra handling. Introduce new SPD-instantiating functions that are aware of this, and use them instead. Signed-off-by: Yo-Jung Lin (Leo) <leo.lin@canonical.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20250430-for-upstream-i801-spd5118-no-instantiate-v2-1-2f54d91ae2c7@canonical.com Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2025-05-19cgroup: compare css to cgroup::self in helper for distingushing cssJP Kobryn
Adjust the implementation of css_is_cgroup() so that it compares the given css to cgroup::self. Rename the function to css_is_self() in order to reflect that. Change the existing css->ss NULL check to a warning in the true branch. Finally, adjust call sites to use the new function name. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19bpf: WARN_ONCE on verifier bugsPaul Chaignon
Throughout the verifier's logic, there are multiple checks for inconsistent states that should never happen and would indicate a verifier bug. These bugs are typically logged in the verifier logs and sometimes preceded by a WARN_ONCE. This patch reworks these checks to consistently emit a verifier log AND a warning when CONFIG_DEBUG_KERNEL is enabled. The consistent use of WARN_ONCE should help fuzzers (ex. syzkaller) expose any situation where they are actually able to reach one of those buggy verifier states. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/aCs1nYvNNMq8dAWP@mail.gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-19spi: sh-msiof: Move register definitions to <linux/spi/sh_msiof.h>Geert Uytterhoeven
Move the MSIOF register and register bit definitions from the MSIOF SPI driver to the existing header file <linux/spi/sh_msiof.h>, so they can be shared with the MSIOF I2S driver. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/066d1086973eb309006258484e9fe8138807e565.1747401908.git.geert+renesas@glider.be Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-19regulator: max8952: Correct Samsung "Electronics" spelling in copyright headersSumanth Gavini
Fix the misspelling of 'Electronics' in max8952 driver copyright headers. Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com> Link: https://patch.msgid.link/20250518085734.88890-7-sumanth.gavini@yahoo.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-17io_uring: pass in struct io_big_cqe to io_alloc_ocqe()Jens Axboe
Rather than pass extra1/extra2 separately, just pass in the (now) named io_big_cqe struct instead. The callers that don't use/support CQE32 will now just pass a single NULL, rather than two seperate mystery zero values. Move the clearing of the big_cqe elements into io_alloc_ocqe() as well, so it can get moved out of the generic code. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-17Merge tag 'mm-hotfixes-stable-2025-05-17-09-41' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Nine singleton hotfixes, all MM. Four are cc:stable" * tag 'mm-hotfixes-stable-2025-05-17-09-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: userfaultfd: correct dirty flags set for both present and swap pte zsmalloc: don't underflow size calculation in zs_obj_write() mm/page_alloc: fix race condition in unaccepted memory handling mm/page_alloc: ensure try_alloc_pages() plays well with unaccepted memory MAINTAINERS: add mm GUP section mm/codetag: move tag retrieval back upfront in __free_pages() mm/memory: fix mapcount / refcount sanity check for mTHP reuse kernel/fork: only call untrack_pfn_clear() on VMAs duplicated for fork() mm: hugetlb: fix incorrect fallback for subpool
2025-05-16mr: consolidate the ipmr_can_free_table() checks.Paolo Abeni
Guoyu Yin reported a splat in the ipmr netns cleanup path: WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_free_table net/ipv4/ipmr.c:440 [inline] WARNING: CPU: 2 PID: 14564 at net/ipv4/ipmr.c:440 ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Modules linked in: CPU: 2 UID: 0 PID: 14564 Comm: syz.4.838 Not tainted 6.14.0 #1 Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:ipmr_free_table net/ipv4/ipmr.c:440 [inline] RIP: 0010:ipmr_rules_exit+0x135/0x1c0 net/ipv4/ipmr.c:361 Code: ff df 48 c1 ea 03 80 3c 02 00 75 7d 48 c7 83 60 05 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e e9 71 67 7f 00 e8 4c 2d 8a fd 90 <0f> 0b 90 eb 93 e8 41 2d 8a fd 0f b6 2d 80 54 ea 01 31 ff 89 ee e8 RSP: 0018:ffff888109547c58 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff888108c12dc0 RCX: ffffffff83e09868 RDX: ffff8881022b3300 RSI: ffffffff83e098d4 RDI: 0000000000000005 RBP: ffff888104288000 R08: 0000000000000000 R09: ffffed10211825c9 R10: 0000000000000001 R11: ffff88801816c4a0 R12: 0000000000000001 R13: ffff888108c13320 R14: ffff888108c12dc0 R15: fffffbfff0b74058 FS: 00007f84f39316c0(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f84f3930f98 CR3: 0000000113b56000 CR4: 0000000000350ef0 Call Trace: <TASK> ipmr_net_exit_batch+0x50/0x90 net/ipv4/ipmr.c:3160 ops_exit_list+0x10c/0x160 net/core/net_namespace.c:177 setup_net+0x47d/0x8e0 net/core/net_namespace.c:394 copy_net_ns+0x25d/0x410 net/core/net_namespace.c:516 create_new_namespaces+0x3f6/0xaf0 kernel/nsproxy.c:110 unshare_nsproxy_namespaces+0xc3/0x180 kernel/nsproxy.c:228 ksys_unshare+0x78d/0x9a0 kernel/fork.c:3342 __do_sys_unshare kernel/fork.c:3413 [inline] __se_sys_unshare kernel/fork.c:3411 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:3411 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xa6/0x1a0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f84f532cc29 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f84f3931038 EFLAGS: 00000246 ORIG_RAX: 0000000000000110 RAX: ffffffffffffffda RBX: 00007f84f5615fa0 RCX: 00007f84f532cc29 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000040000400 RBP: 00007f84f53fba18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f84f5615fa0 R15: 00007fff51c5f328 </TASK> The running kernel has CONFIG_IP_MROUTE_MULTIPLE_TABLES disabled, and the sanity check for such build is still too loose. Address the issue consolidating the relevant sanity check in a single helper regardless of the kernel configuration. Also share it between the ipv4 and ipv6 code. Reported-by: Guoyu Yin <y04609127@gmail.com> Fixes: 50b94204446e ("ipmr: tune the ipmr_can_free_table() checks.") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/372dc261e1bf12742276e1b984fc5a071b7fc5a8.1747321903.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16net: phy: fixed_phy: remove fixed_phy_register_with_gpiodHeiner Kallweit
Since its introduction 6 yrs ago this functions has never had a user. So remove it. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/ccbeef28-65ae-4e28-b1db-816c44338dee@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-16Merge tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: - NFS: Fix a couple of missed handlers for the ENETDOWN and ENETUNREACH transport errors - NFS: Handle Oopsable failure of nfs_get_lock_context in the unlock path - NFSv4: Fix a race in nfs_local_open_fh() - NFSv4/pNFS: Fix a couple of layout segment leaks in layoutreturn - NFSv4/pNFS Avoid sharing pNFS DS connections between net namespaces since IP addresses are not guaranteed to refer to the same nodes - NFS: Don't flush file data while holding multiple directory locks in nfs_rename() * tag 'nfs-for-6.15-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Avoid flushing data while holding directory locks in nfs_rename() NFS/pnfs: Fix the error path in pnfs_layoutreturn_retry_later_locked() NFSv4/pnfs: Reset the layout state after a layoutreturn NFS/localio: Fix a race in nfs_local_open_fh() nfs: nfs3acl: drop useless assignment in nfs3_get_acl() nfs: direct: drop useless initializer in nfs_direct_write_completion() nfs: move the nfs4_data_server_cache into struct nfs_net nfs: don't share pNFS DS connections between net namespaces nfs: handle failure of nfs_get_lock_context in unlock path pNFS/flexfiles: Record the RPC errors in the I/O tracepoints NFSv4/pnfs: Layoutreturn on close must handle fatal networking errors NFSv4: Handle fatal ENETDOWN and ENETUNREACH errors
2025-05-16NFS: Avoid flushing data while holding directory locks in nfs_rename()Trond Myklebust
The Linux client assumes that all filehandles are non-volatile for renames within the same directory (otherwise sillyrename cannot work). However, the existence of the Linux 'subtree_check' export option has meant that nfs_rename() has always assumed it needs to flush writes before attempting to rename. Since NFSv4 does allow the client to query whether or not the server exhibits this behaviour, and since knfsd does actually set the appropriate flag when 'subtree_check' is enabled on an export, it should be OK to optimise away the write flushing behaviour in the cases where it is clearly not needed. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2025-05-16genirq/msi: Add helper for creating MSI-parent irq domainsMarc Zyngier
Creating an irq domain that serves as an MSI parent requires a substantial amount of esoteric boiler-plate code, some of which is often provided twice (such as the bus token). To make things a bit simpler for the unsuspecting MSI tinkerer, provide a helper that does it for them, and serves as documentation of what needs to be provided. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250513172819.2216709-3-maz@kernel.org