summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-09-23fs: Create sb_encoding() helperAndré Almeida
Filesystems that need to deal with the super block encoding need to use a if IS_ENABLED(CONFIG_UNICODE) around it because this struct member is not declared otherwise. In order to move this if/endif guards outside of the filesytem code and make it simpler, create a new function that returns the s_encoding member of struct super_block if Unicode is enabled, and return NULL otherwise. Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
2025-09-23dibs: Move event handling to dibs layerJulian Ruess
Add defines for all event types and subtypes an ism device is known to produce as it can be helpful for debugging purposes. Introduces a generic 'struct dibs_event' and adopt ism device driver and smc-d client accordingly. Tolerate and ignore other type and subtype values to enable future device extensions. SMC-D and ISM are now independent. struct ism_dev can be moved to drivers/s390/net/ism.h. Note that in smc, the term 'ism' is still used. Future patches could replace that with 'dibs' or 'smc-d' as appropriate. Signed-off-by: Julian Ruess <julianr@linux.ibm.com> Co-developed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-15-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Move data path to dibs layerAlexandra Winter
Use struct dibs_dmb instead of struct smc_dmb and move the corresponding client tables to dibs_dev. Leave driver specific implementation details like sba in the device drivers. Register and unregister dmbs via dibs_dev_ops. A dmb is dedicated to a single client, but a dibs device can have dmbs for more than one client. Trigger dibs clients via dibs_client_ops->handle_irq(), when data is received into a dmb. For dibs_loopback replace scheduling an smcd receive tasklet with calling dibs_client_ops->handle_irq(). For loopback devices attach_dmb(), detach_dmb() and move_data() need to access the dmb tables, so move those to dibs_dev_ops in this patch as well. Remove remaining definitions of smc_loopback as they are no longer required, now that everything is in dibs_loopback. Note that struct ism_client and struct ism_dev are still required in smc until a follow-on patch moves event handling to dibs. (Loopback does not use events). Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-14-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Move query_remote_gid() to dibs_dev_opsAlexandra Winter
Provide the dibs_dev_ops->query_remote_gid() in ism and dibs_loopback dibs_devices. And call it in smc dibs_client. Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-13-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Move vlan support to dibs_dev_opsAlexandra Winter
It can be debated how much benefit definition of vlan ids for dibs devices brings, as the dmbs are accessible only by a single peer anyhow. But ism provides vlan support and smcd exploits it, so move it to dibs layer as an optional feature. smcd_loopback simply ignores all vlan settings, do the same in dibs_loopback. SMC-D and ISM have a method to use the invalid VLAN ID 1FFF (ISM_RESERVED_VLANID), to indicate that both communication peers support routable SMC-Dv2. Tolerate it in dibs, but move it to SMC only. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-12-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Local gid for dibs devicesAlexandra Winter
Define a uuid_t GID attribute to identify a dibs device. SMC uses 64 Bit and 128 Bit Global Identifiers (GIDs) per device, that need to be sent via the SMC protocol. Because the smc code uses integers, network endianness and host endianness need to be considered. Avoid this in the dibs layer by using uuid_t byte arrays. Future patches could change SMC to use uuid_t. For now conversion helper functions are introduced. ISM devices provide 64 Bit GIDs. Map them to dibs uuid_t GIDs like this: _________________________________________ | 64 Bit ISM-vPCI GID | 00000000_00000000 | ----------------------------------------- If interpreted as UUID [1], this would be interpreted as the UIID variant, that is reserved for NCS backward compatibility. So it will not collide with UUIDs that were generated according to the standard. smc_loopback already uses version 4 UUIDs as 128 Bit GIDs, move that to dibs loopback. A temporary change to smc_lo_query_rgid() is required, that will be moved to dibs_loopback with a follow-on patch. Provide gid of a dibs device as sysfs read-only attribute. Link: https://datatracker.ietf.org/doc/html/rfc4122 [1] Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-11-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Move struct device to dibs_devJulian Ruess
Move struct device from ism_dev and smc_lo_dev to dibs_dev, and define a corresponding release function. Free ism_dev in ism_remove() and smc_lo_dev in smc_lo_dev_remove(). Replace smcd->ops->get_dev(smcd) by using dibs->dev directly. An alternative design would be to embed dibs_dev as a field in ism_dev and do the same for other dibs device driver specific structs. However that would have the disadvantage that each dibs device driver needs to allocate dibs_dev and each dibs device driver needs a different device release function. The advantage would be that ism_dev and other device driver specific structs would be covered by device reference counts. Signed-off-by: Julian Ruess <julianr@linux.ibm.com> Co-developed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-9-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Define dibs_client_ops and dibs_dev_opsAlexandra Winter
Move the device add() and remove() functions from ism_client to dibs_client_ops and call add_dev()/del_dev() for ism devices and dibs_loopback devices. dibs_client_ops->add_dev() = smcd_register_dev() for the smc_dibs_client. This is the first step to handle ism and loopback devices alike (as dibs devices) in the smc dibs client. Define dibs_dev->ops and move smcd_ops->get_chid to dibs_dev_ops->get_fabric_id() for ism and loopback devices. See below for why this needs to be in the same patch as dibs_client_ops->add_dev(). The following changes contain intermediate steps, that will be obsoleted by follow-on patches, once more functionality has been moved to dibs: Use different smcd_ops and max_dmbs for ism and loopback. Follow-on patches will change SMC-D to directly use dibs_ops instead of smcd_ops. In smcd_register_dev() it is now necessary to identify a dibs_loopback device before smcd_dev and smcd_ops->get_chid() are available. So provide dibs_dev_ops->get_fabric_id() in this patch and evaluate it in smc_ism_is_loopback(). Call smc_loopback_init() in smcd_register_dev() and call smc_loopback_exit() in smcd_unregister_dev() to handle the functionality that is still in smc_loopback. Follow-on patches will move all smc_loopback code to dibs_loopback. In smcd_[un]register_dev() use only ism device name, this will be replaced by dibs device name by a follow-on patch. End of changes with intermediate parts. Allocate an smcd event workqueue for all dibs devices, although dibs_loopback does not generate events. Use kernel memory instead of devres memory for smcd_dev and smcd->conn. Since commit a72178cfe855 ("net/smc: Fix dependency of SMC on ISM") an ism device and its driver can have a longer lifetime than the smc module, so smc should not rely on devres to free its resources [1]. It is now the responsibility of the smc client to free smcd and smcd->conn for all dibs devices, ism devices as well as loopback. Call client->ops->del_dev() for all existing dibs devices in dibs_unregister_client(), so all device related structures can be freed in the client. When dibs_unregister_client() is called in the context of smc_exit() or smc_core_reboot_event(), these functions have already called smc_lgrs_shutdown() which calls smc_smcd_terminate_all(smcd) and sets going_away. This is done a second time in smcd_unregister_dev(). This is analogous to how smcr is handled in these functions, by calling first smc_lgrs_shutdown() and then smc_ib_unregister_client() > smc_ib_remove_dev(), so leave it that way. It may be worth investigating, whether smc_lgrs_shutdown() is still required or useful. Remove CONFIG_SMC_LO. CONFIG_DIBS_LO now controls whether a dibs loopback device exists or not. Link: https://www.kernel.org/doc/Documentation/driver-model/devres.txt [1] Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-8-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Register ism as dibs deviceAlexandra Winter
Register ism devices with the dibs layer. Follow-on patches will move functionality to the dibs layer. As DIBS is only a shim layer without any dependencies, we can depend ISM on DIBS without adding indirect dependencies. A follow-on patch will remove implication of SMC by ISM. Define struct dibs_dev. Follow-on patches will move more content into dibs_dev. The goal of follow-on patches is that ism_dev will only contain fields that are special for this device driver. The same concept will apply to other dibs device drivers. Define dibs_dev_alloc(), dibs_dev_add() and dibs_dev_del() to be called by dibs device drivers and call them from ism_drv.c Use ism_dev.dibs for a pointer to dibs_dev. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-6-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Register smc as dibs_clientAlexandra Winter
Formally register smc as dibs client. Functionality will be moved by follow-on patches from ism_client to dibs_client until eventually ism_client can be removed. As DIBS is only a shim layer without any dependencies, we can depend SMC on DIBS without adding indirect dependencies. A follow-on patch will remove dependency of SMC on ISM. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-5-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23dibs: Create drivers/dibsAlexandra Winter
Create the file structure for a 'DIBS - Direct Internal Buffer Sharing' shim layer that will provide generic functionality and declarations for dibs device drivers and dibs clients. Following patches will add functionality. Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-4-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-23Merge branch 'srcu-next.2025.08.21a' into HEADPaul E. McKenney
SRCU updates: * Create srcu_read_{,un}lock_fast_notrace() * Add srcu_read_lock_fast_notrace() and srcu_read_unlock_fast_notrace() * Add guards for notrace variants of SRCU-fast readers * Document srcu_read_{,un}lock_fast() use of implicit RCU readers * Document srcu_flip() memory-barrier D relation to SRCU-fast * Remove preempt_disable/enable() in Tiny SRCU srcu_gp_start_if_needed()
2025-09-23genirq/msi: Remove msi_post_free()Nam Cao
The only user of msi_post_free() - powerpc/pseries - has been changed to use msi_teardown(). Remove this unused callback. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250916061007.964005-1-namcao@linutronix.de
2025-09-23spi: rpc-if: Add resume support for RZ/G3EMark Brown
Merge series from Biju Das <biju.das.jz@bp.renesas.com>: On RZ/G3E using PSCI, s2ram powers down the SoC. After resume, reinitialize the hardware for SPI operations. Also Replace the macro SIMPLE_DEV_PM_OPS->DEFINE_SIMPLE_DEV_PM_OPS macro and use pm_sleep_ptr(). This lets us drop the check for CONFIG_PM_SLEEP and __maybe_unused attribute from PM functions.
2025-09-23Support reading Subsystem ID from Device TreeMark Brown
Merge series from Stefan Binding <sbinding@opensource.cirrus.com>: In PC systems using ACPI, the driver is able to read back an SSID from the _SUB property. This SSID uniquely identifies the system, which enables the driver to read the correct firmware and tuning for that system from linux-firmware. Currently there is no way of reading this property from device tree. Add an equivalent property in device tree to perform the same role.
2025-09-23can: dev: turn can_set_static_ctrlmode() into a non-inline functionVincent Mailhol
can_set_static_ctrlmode() is declared as a static inline. But it is only called in the probe function of the devices and so does not really benefit from any kind of optimization. Transform it into a "normal" function by moving it to drivers/net/can/dev/dev.c Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/20250923-can-fix-mtu-v3-2-581bde113f52@kernel.org Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-09-23io_uring/cmd: drop unused res2 param from io_uring_cmd_done()Caleb Sander Mateos
Commit 79525b51acc1 ("io_uring: fix nvme's 32b cqes on mixed cq") split out a separate io_uring_cmd_done32() helper for ->uring_cmd() implementations that return 32-byte CQEs. The res2 value passed to io_uring_cmd_done() is now unused because __io_uring_cmd_done() ignores it when is_cqe32 is passed as false. So drop the parameter from io_uring_cmd_done() to simplify the callers and clarify that it's not possible to return an extra value beyond the 32-bit CQE result. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-09-22mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()Lorenzo Stoakes
In commit bb666b7c2707 ("mm: add mmap_prepare() compatibility layer for nested file systems") we introduced the ability for stacked drivers and file systems to correctly invoke the f_op->mmap_prepare() handler from an f_op->mmap() handler via a compatibility layer implemented in compat_vma_mmap_prepare(). This populates vm_area_desc fields according to those found in the (not yet fully initialised) VMA passed to f_op->mmap(). However this function implicitly assumes that the struct file which we are operating upon is equal to vma->vm_file. This is not a safe assumption in all cases. The only really sane situation in which this matters would be something like e.g. i915_gem_dmabuf_mmap() which invokes vfs_mmap() against obj->base.filp: ret = vfs_mmap(obj->base.filp, vma); if (ret) return ret; And then sets the VMA's file to this, should the mmap operation succeed: vma_set_file(vma, obj->base.filp); That is - it is the file that is intended to back the VMA mapping. This is not an issue currently, as so far we have only implemented f_op->mmap_prepare() handlers for some file systems and internal mm uses, and the only stacked f_op->mmap() operations that can be performed upon these are those in backing_file_mmap() and coda_file_mmap(), both of which use vma->vm_file. However, moving forward, as we convert drivers to using f_op->mmap_prepare(), this will become a problem. Resolve this issue by explicitly setting desc->file to the provided file parameter and update callers accordingly. Callers are expected to read desc->file and update desc->vm_file - the former will be the file provided by the caller (if stacked, this may differ from vma->vm_file). If the caller needs to differentiate between the two they therefore now can. While we are here, also provide a variant of compat_vma_mmap_prepare() that operates against a pointer to any file_operations struct and does not assume that the file_operations struct we are interested in is file->f_op. This function is __compat_vma_mmap_prepare() and we invoke it from compat_vma_mmap_prepare() so that we share code between the two functions. This is important, because some drivers provide hooks in a separate struct, for instance struct drm_device provides an fops field for this purpose. Also update the VMA selftests accordingly. Link: https://lkml.kernel.org/r/dd0c72df8a33e8ffaa243eeb9b01010b670610e9.1756920635.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22mm: specify separate file and vm_file params in vm_area_descLorenzo Stoakes
Patch series "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()", v2. As part of the efforts to eliminate the problematic f_op->mmap callback, a new callback - f_op->mmap_prepare was provided. While we are converting these callbacks, we must deal with 'stacked' filesystems and drivers - those which in their own f_op->mmap callback invoke an inner f_op->mmap callback. To accomodate for this, a compatibility layer is provided that, via vfs_mmap(), detects if f_op->mmap_prepare is provided and if so, generates a vm_area_desc containing the VMA's metadata and invokes the call. So far, we have provided desc->file equal to vma->vm_file. However this is not necessarily valid, especially in the case of stacked drivers which wish to assign a new file after the inner hook is invoked. To account for this, we adjust vm_area_desc to have both file and vm_file fields. The .vm_file field is strictly set to vma->vm_file (or in the case of a new mapping, what will become vma->vm_file). However, .file is set to whichever file vfs_mmap() is invoked with when using the compatibilty layer. Therefore, if the VMA's file needs to be updated in .mmap_prepare, desc->vm_file should be assigned, whilst desc->file should be read. No current f_op->mmap_prepare users assign desc->file so this is safe to do. This makes the .mmap_prepare callback in the context of a stacked filesystem or driver completely consistent with the existing .mmap implementations. While we're here, we do a few small cleanups, and ensure that we const-ify things correctly in the vm_area_desc struct to avoid hooks accidentally trying to assign fields they should not. This patch (of 2): Stacked filesystems and drivers may invoke mmap hooks with a struct file pointer that differs from the overlying file. We will make this functionality possible in a subsequent patch. In order to prepare for this, let's update vm_area_struct to separately provide desc->file and desc->vm_file parameters. The desc->file parameter is the file that the hook is expected to operate upon, and is not assignable (though the hok may wish to e.g. update the file's accessed time for instance). The desc->vm_file defaults to what will become vma->vm_file and is what the hook must reassign should it wish to change the VMA"s vma->vm_file. For now we keep desc->file, vm_file the same to remain consistent. No f_op->mmap_prepare() callback sets a new vma->vm_file currently, so this is safe to change. While we're here, make the mm_struct desc->mm pointers at immutable as well as the desc->mm field itself. As part of this change, also update the single hook which this would otherwise break - mlock_future_ok(), invoked by secretmem_mmap_prepare()). We additionally update set_vma_from_desc() to compare fields in a more logical fashion, checking the (possibly) user-modified fields as the first operand against the existing value as the second one. Additionally, update VMA tests to accommodate changes. Link: https://lkml.kernel.org/r/cover.1756920635.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3fa15a861bb7419f033d22970598aa61850ea267.1756920635.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pedro Falcato <pfalcato@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lockOleg Nesterov
The ancient comment above task_lock() states that it can be nested outside of read_lock(&tasklist_lock), but this is no longer true: CPU_0 CPU_1 CPU_2 task_lock() read_lock(tasklist) write_lock_irq(tasklist) read_lock(tasklist) task_lock() Unless CPU_0 calls read_lock() in IRQ context, queued_read_lock_slowpath() won't get the lock immediately, it will spin waiting for the pending writer on CPU_2, resulting in a deadlock. Link: https://lkml.kernel.org/r/20250914110908.GA18769@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-22bpf: Implement signature verification for BPF programsKP Singh
This patch extends the BPF_PROG_LOAD command by adding three new fields to `union bpf_attr` in the user-space API: - signature: A pointer to the signature blob. - signature_size: The size of the signature blob. - keyring_id: The serial number of a loaded kernel keyring (e.g., the user or session keyring) containing the trusted public keys. When a BPF program is loaded with a signature, the kernel: 1. Retrieves the trusted keyring using the provided `keyring_id`. 2. Verifies the supplied signature against the BPF program's instruction buffer. 3. If the signature is valid and was generated by a key in the trusted keyring, the program load proceeds. 4. If no signature is provided, the load proceeds as before, allowing for backward compatibility. LSMs can chose to restrict unsigned programs and implement a security policy. 5. If signature verification fails for any reason, the program is not loaded. Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: KP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/20250921160120.9711-2-kpsingh@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-09-22tcp: move mtu_info to remove two 32bit holesEric Dumazet
This removes 8bytes waste on 64bit builds. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250919204856.2977245-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22tcp: move tcp_clean_acked to tcp_sock_read_tx groupEric Dumazet
tp->tcp_clean_acked is fetched in tx path when snd_una is updated. This field thus belongs to tcp_sock_read_tx group. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250919204856.2977245-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22tcp: move recvmsg_inq to tcp_sock_read_txrxEric Dumazet
Fill a hole in tcp_sock_read_txrx, instead of possibly wasting a cache line. Note that tcp_recvmsg_locked() is also reading tp->repair, so this removes one cache line miss in tcp recvmsg(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250919204856.2977245-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22tcp: move tcp->rcv_tstamp to tcp_sock_write_txrx groupEric Dumazet
tcp_ack() writes this field, it belongs to tcp_sock_write_txrx. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250919204856.2977245-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22Merge tag 'mlx5-next-counters' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-09-21 * tag 'mlx5-next-counters' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Add uar access and odp page fault counters ==================== Link: https://patch.msgid.link/1758443940-708689-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22net: sfp: remove old sfp_parse_* functionsRussell King (Oracle)
Remove the old sfp_parse_*() functions that are now no longer used. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uydVz-000000061Wj-13Yd@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22net: sfp: provide sfp_get_module_caps()Russell King (Oracle)
Provide a function to retrieve the current sfp_module_caps structure so that upstreams can get the entire module support in one go. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uydVj-000000061WQ-3q47@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22net: sfp: pre-parse the module supportRussell King (Oracle)
Pre-parse the module support on insert rather than when the upstream requests the data. This will allow more flexible and extensible parsing. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uydVZ-000000061WE-2pXD@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22net: phy: add phy_interface_copy()Russell King (Oracle)
Add a helper for copying PHY interface bitmasks. This will be used by the SFP bus code, which will then be moved to phylink in the subsequent patches. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1uydVU-000000061W8-2IDT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-22workqueue: fix texinfodocs warning for WQ_* flags referenceKriish Sharma
Sphinx emitted a warning during make texinfodocs: WARNING: Inline literal start-string without end-string. This was caused by the trailing '*' in "%WQ_*" being parsed as reStructuredText markup in the kernel-doc comment. Escape the '*' in the comment so that Sphinx treats it as a literal character, resolving the warning. Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-22ns: simplify ns_common_init() furtherChristian Brauner
Simply derive the ns operations from the namespace type. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-22cgroup: add missing ns_common includeChristian Brauner
Add the missing include of the ns_common header. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-22spi: rename SPI_CS_CNT_MAX => SPI_DEVICE_CS_CNT_MAXJonas Gorski
Rename SPI_CS_CNT_MAX to SPI_DEVICE_CS_CNT_MAX to make it more obvious that this is the max number of CS per device supported, not per controller. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250915183725.219473-8-jonas.gorski@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-22spi: reduce device chip select limit againJonas Gorski
The spi chipselect limit SPI_CS_CNT_MAX was raised with commit 2f8c7c3715f2 ("spi: Raise limit on number of chip selects") from 4 to 16 to accommodate spi controllers with more than 4 chip selects, and then later to 24 with commit 96893cdd4760 ("spi: Raise limit on number of chip selects to 24"). Now that we removed SPI_CS_CNT_MAX limiting the chip selects of controllers, we can reduce the amount of chip selects per device again to 4, the original value. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250915183725.219473-7-jonas.gorski@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-22spi: keep track of number of chipselects in spi_deviceJonas Gorski
There are several places where we need to iterate over a device's chipselect. To be able to do it efficiently, store the number of chipselects in spi_device, like we do for controllers. Since we now use a device supplied value, add a check to make sure it isn't more than we can support. Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Link: https://patch.msgid.link/20250915183725.219473-3-jonas.gorski@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-09-21SUNRPC: Move the svc_rpcb_cleanup() call sitesChuck Lever
Clean up: because svc_rpcb_cleanup() and svc_xprt_destroy_all() are always invoked in pairs, we can deduplicate code by moving the svc_rpcb_cleanup() call sites into svc_xprt_destroy_all(). Tested-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: unregister with rpcbind when deleting a transportOlga Kornievskaia
When a listener is added, a part of creation of transport also registers program/port with rpcbind. However, when the listener is removed, while transport goes away, rpcbind still has the entry for that port/type. When deleting the transport, unregister with rpcbind when appropriate. ---v2 created a new xpt_flag XPT_RPCB_UNREG to mark TCP and UDP transport and at xprt destroy send rpcbind unregister if flag set. Suggested-by: Chuck Lever <chuck.lever@oracle.com> Fixes: d093c9089260 ("nfsd: fix management of listener transports") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21vfs: add ATTR_CTIME_SET flagJeff Layton
When ATTR_ATIME_SET and ATTR_MTIME_SET are set in the ia_valid mask, the notify_change() logic takes that to mean that the request should set those values explicitly, and not override them with "now". With the advent of delegated timestamps, similar functionality is needed for the ctime. Add a ATTR_CTIME_SET flag, and use that to indicate that the ctime should be accepted as-is. Also, clean up the if statements to eliminate the extra negatives. In setattr_copy() and setattr_copy_mgtime() use inode_set_ctime_deleg() when ATTR_CTIME_SET is set, instead of basing the decision on ATTR_DELEG. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21sunrpc: Change ret code of xdr_stream_decode_opaque_fixedSergey Bashirov
Since the opaque is fixed in size, the caller already knows how many bytes were decoded, on success. Thus, xdr_stream_decode_opaque_fixed() doesn't need to return that value. And, xdr_stream_decode_u32 and _u64 both return zero on success. This patch simplifies the caller's error checking to avoid potential integer promotion issues. Suggested-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-21nfsd: discard nfsd_file_get_local()NeilBrown
This interface was deprecated by commit e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file") and is now unused. So let's remove it. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-09-22Merge tag 'drm-xe-next-2025-09-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Drop L3 bank mask reporting from the media GT on Xe3 and later. Only do that for the primary GT. No userspace needs or uses it for media and some platforms may report bogus values. - Add SLPC power_profile sysfs interface with support for base and power_saving modes (Vinay Belgaumkar, Rodrigo Vivi) - Add configfs attributes to add post/mid context-switch commands (Lucas De Marchi) Cross-subsystem Changes: - Fix hmm_pfn_to_map_order() usage in gpusvm and refactor APIs to align with pieces previous handled by xe_hmm (Matthew Auld) Core Changes: - Add MEI driver for Late Binding Firmware Update/Upload (Alexander Usyskin) Driver Changes: - Fix GuC CT teardown wrt TLB invalidation (Satyanarayana) - Fix CCS save/restore on VF (Satyanarayana) - Increase default GuC crash buffer size (Zhanjun) - Allow to clear GT stats in debugfs to aid debugging (Matthew Brost) - Add more SVM GT stats to debugfs (Matthew Brost) - Fix error handling in VMA attr query (Himal) - Move sa_info in debugfs to be per tile (Michal Wajdeczko) - Limit number of retries upon receiving NO_RESPONSE_RETRY from GuC to avoid endless loop (Michal Wajdeczko) - Fix configfs handling for survivability_mode undoing user choice when unbinding the module (Michal Wajdeczko) - Refactor configfs attribute visibility to future-proof it and stop exposing survivability_mode if not applicable (Michal Wajdeczko) - Constify some functions (Harish Chegondi, Michal Wajdeczko) - Add/extend more HW workarounds for Xe2 and Xe3 (Harish Chegondi, Tangudu Tilak Tirumalesh) - Replace xe_hmm with gpusvm (Matthew Auld) - Improve fake pci and WA kunit handling for testing new platforms (Michal Wajdeczko) - Reduce unnecessary PTE writes when migrating (Sanjay Yadav) - Cleanup GuC interface definitions and log message (John Harrison) - Small improvements around VF CCS (Michal Wajdeczko) - Enable bus mastering for the I2C controller (Raag Jadav) - Prefer devm_mutex of hand rolling it (Christophe JAILLET) - Drop sysfs and debugfs attributes not available for VF (Michal Wajdeczko) - GuC CT devm actions improvements (Michal Wajdeczko) - Recommend new GuC versions for PTL and BMG (Julia Filipchuk) - Improveme driver handling for exhaustive eviction using new xe_validation wrapper around drm_exec (Thomas Hellström) - Add and use printk wrappers for tile and device (Michal Wajdeczko) - Better document workaround handling in Xe (Lucas De Marchi) - Improvements on ARRAY_SIZE and ERR_CAST usage (Lucas De Marchi, Fushuai Wang) - Align CSS firmware headers with the GuC APIs (John Harrison) - Test GuC to GuC (G2G) communication to aid debug in pre-production firmware (John Harrison) - Bail out driver probing if GuC fails to load (John Harrison) - Allow error injection in xe_pxp_exec_queue_add() (Daniele Ceraolo Spurio) - Minor refactors in xe_svm (Shuicheng Lin) - Fix madvise ioctl error handling (Shuicheng Lin) - Use attribute groups to simplify sysfs registration (Michal Wajdeczko) - Add Late Binding Firmware implementation in Xe to work together with the MEI component (Badal Nilawar, Daniele Ceraolo Spurio, Rodrigo Vivi) - Fix build with CONFIG_MODULES=n (Lucas De Marchi) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/c2et6dnkst2apsgt46dklej4nprqdukjosb55grpaknf3pvcxy@t7gtn3hqtp6n
2025-09-21virtio_config: clarify output parametersAlyssa Ross
This was ambiguous enough for a broken patch (206cc44588f7 ("virtio: reject shm region if length is zero")) to make it into the kernel, so make it clearer. Link: https://lore.kernel.org/r/20250816071600-mutt-send-email-mst@kernel.org/ Signed-off-by: Alyssa Ross <hi@alyssa.is> Message-Id: <20250829150944.233505-1-hi@alyssa.is> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-09-21mm/damon/core: implement damon_initialized() functionSeongJae Park
Patch series "mm/damon: define and use DAMON initialization check function". DAMON is initialized in subsystem initialization time, by damon_init(). If DAMON API functions are called before the initialization, the system could crash. Actually such issues happened and were fixed [1] in the past. For the fix, DAMON API callers have updated to check if DAMON is initialized or not, using their own hacks. The hacks are unnecessarily duplicated on every DAMON API callers and therefore it would be difficult to reliably maintain in the long term. Make it reliable and easy to maintain. For this, implement a new DAMON core layer API function that returns if DAMON is successfully initialized. If it returns true, it means DAMON API functions are safe to be used. After the introduction of the new API, update DAMON API callers to use the new function instead of their own hacks. This patch (of 7): If DAMON is tried to be used when it is not yet successfully initialized, the caller could be crashed. DAMON core layer is not providing a reliable way to see if it is successfully initialized and therefore ready to be used, though. As a result, DAMON API callers are implementing their own hacks to see it. The hacks simply assume DAMON should be ready on module init time. It is not reliable as DAMON initialization can indeed fail if KMEM_CACHE() fails, and difficult to maintain as those are duplicates. Implement a core layer API function for better reliability and maintainability to replace the hacks with followup commits. Link: https://lkml.kernel.org/r/20250916033511.116366-2-sj@kernel.org Link: https://lkml.kernel.org/r/20250916033511.116366-2-sj@kernel.org Link: https://lore.kernel.org/20250909022238.2989-1-sj@kernel.org [1] Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21alloc_tag: mark inaccurate allocation counters in /proc/allocinfo outputSuren Baghdasaryan
While rare, memory allocation profiling can contain inaccurate counters if slab object extension vector allocation fails. That allocation might succeed later but prior to that, slab allocations that would have used that object extension vector will not be accounted for. To indicate incorrect counters, "accurate:no" marker is appended to the call site line in the /proc/allocinfo output. Bump up /proc/allocinfo version to reflect the change in the file format and update documentation. Example output with invalid counters: allocinfo - version: 2.0 0 0 arch/x86/kernel/kdebugfs.c:105 func:create_setup_data_nodes 0 0 arch/x86/kernel/alternative.c:2090 func:alternatives_smp_module_add 0 0 arch/x86/kernel/alternative.c:127 func:__its_alloc accurate:no 0 0 arch/x86/kernel/fpu/regset.c:160 func:xstateregs_set 0 0 arch/x86/kernel/fpu/xstate.c:1590 func:fpstate_realloc 0 0 arch/x86/kernel/cpu/aperfmperf.c:379 func:arch_enable_hybrid_capacity_scale 0 0 arch/x86/kernel/cpu/amd_cache_disable.c:258 func:init_amd_l3_attrs 49152 48 arch/x86/kernel/cpu/mce/core.c:2709 func:mce_device_create accurate:no 32768 1 arch/x86/kernel/cpu/mce/genpool.c:132 func:mce_gen_pool_create 0 0 arch/x86/kernel/cpu/mce/amd.c:1341 func:mce_threshold_create_device [surenb@google.com: document new "accurate:no" marker] Fixes: 39d117e04d15 ("alloc_tag: mark inaccurate allocation counters in /proc/allocinfo output") [akpm@linux-foundation.org: simplification per Usama, reflow text] [akpm@linux-foundation.org: add newline to prevent docs warning, per Randy] Link: https://lkml.kernel.org/r/20250915230224.4115531-1-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Cc: David Wang <00107082@163.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sourav Panda <souravpanda@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21mm/oom_kill: thaw the entire OOM victim processzhongjinji
Patch series "Improvements to Victim Process Thawing and OOM Reaper Traversal Order", v10. This patch series focuses on optimizing victim process thawing and refining the traversal order of the OOM reaper. Since __thaw_task() is used to thaw a single thread of the victim, thawing only one thread cannot guarantee the exit of the OOM victim when it is frozen. Patch 1 thaw the entire process of the OOM victim to ensure that OOM victims are able to terminate themselves. Even if the oom_reaper is delayed, patch 2 is still beneficial for reaping processes with a large address space footprint, and it also greatly improves process_mrelease. This patch (of 10): OOM killer is a mechanism that selects and kills processes when the system runs out of memory to reclaim resources and keep the system stable. But the oom victim cannot terminate on its own when it is frozen, even if the OOM victim task is thawed through __thaw_task(). This is because __thaw_task() can only thaw a single OOM victim thread, and cannot thaw the entire OOM victim process. In addition, freezing_slow_path() determines whether a task is an OOM victim by checking the task's TIF_MEMDIE flag. When a task is identified as an OOM victim, the freezer bypasses both PM freezing and cgroup freezing states to thaw it. Historically, TIF_MEMDIE was a "this is the oom victim & it has access to memory reserves" flag in the past. It has that thread vs. process problems and tsk_is_oom_victim was introduced later to get rid of them and other issues as well as the guarantee that we can identify the oom victim's mm reliably for other oom_reaper. Therefore, thaw_process() is introduced to unfreeze all threads within the OOM victim process, ensuring that every thread is properly thawed. The freezer now uses tsk_is_oom_victim() to determine OOM victim status, allowing all victim threads to be unfrozen as necessary. With this change, the entire OOM victim process will be thawed when an OOM event occurs, ensuring that the victim can terminate on its own. Link: https://lkml.kernel.org/r/20250915162946.5515-1-zhongjinji@honor.com Link: https://lkml.kernel.org/r/20250915162946.5515-2-zhongjinji@honor.com Signed-off-by: zhongjinji <zhongjinji@honor.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to ↵Andrew Morton
static inlines For all the usual reasons, plus a new one. Calling (void)arch_enter_lazy_mmu_mode(); deservedly blows up. Cc: Balbir Singh <balbirs@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21mm: remove page->orderMatthew Wilcox (Oracle)
We already use page->private for storing the order of a page while it's in the buddy allocator system; extend that to also storing the order while it's in the pcp_llist. Link: https://lkml.kernel.org/r/20250910142923.2465470-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21mm: constify compound_order() and page_size()Matthew Wilcox (Oracle)
Patch series "Small cleanups". These small cleanups can be applied now to reduce conflicts during the next merge window. They're all from various efforts to split struct page from other memdescs. Thanks to Vlastimil for the suggestion. This patch (of 3): These functions do not modify their arguments. Telling the compiler this may improve code generation, and allows us to pass const arguments from other functions. Link: https://lkml.kernel.org/r/20250910142923.2465470-1-willy@infradead.org Link: https://lkml.kernel.org/r/20250910142923.2465470-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-21mm: make folio page count functions return unsignedAristeu Rozanski
As raised by Andrew [1], a folio/compound page never spans a negative number of pages. Consequently, let's use "unsigned long" instead of "long" consistently for folio_nr_pages(), folio_large_nr_pages() and compound_nr(). Using "unsigned long" as return value is fine, because even "(long)-folio_nr_pages()" will keep on working as expected. Using "unsigned int" instead would actually break these use cases. This patch takes the first step changing these to return unsigned long (and making drm_gem_get_pages() use the new types instead of replacing min()). In the future, we might want to make more callers of these functions to consistently use "unsigned long". Link: https://lore.kernel.org/linux-mm/20250503182858.5a02729fcffd6d4723afcfc2@linux-foundation.org/ Link: https://lkml.kernel.org/r/20250826153721.GA23292@cathedrallabs.org Link: https://lore.kernel.org/linux-mm/20250503182858.5a02729fcffd6d4723afcfc2@linux-foundation.org/ [1] Signed-off-by: Aristeu Rozanski <aris@ruivo.org> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>