summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-04-18Merge branch 'pm-docs'Rafael J. Wysocki
Merge a runtime PM documentation correction for 6.15-rc3. * pm-docs: Documentation: PM: runtime: Fix a reference to pm_runtime_autosuspend()
2025-04-18Merge tag 'riscv-for-linus-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix for an issue where C instructions ended up in non-C builds, due to some broken inline assembly in the KGDB breakpoint insertion code - A fix to avoid spurious printk messages about misaligned access performance probing - A fix for a handful of issues with /proc/iomem's reserved region handling - A pair of fixes for module relocation processing - A few build-time fixes * tag 'riscv-for-linus-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: KGDB: Remove ".option norvc/.option rvc" for kgdb_compiled_break riscv: KGDB: Do not inline arch_kgdb_breakpoint() riscv: Avoid fortify warning in syscall_get_arguments() riscv: Provide all alternative macros all the time riscv: module: Allocate PLT entries for R_RISCV_PLT32 riscv: module: Fix out-of-bounds relocation access riscv: Properly export reserved regions in /proc/iomem riscv: Fix unaligned access info messages riscv: Avoid fortify warning in syscall_get_arguments() Documentation: riscv: Fix typo MIMPLID -> MIMPID riscv: Use kvmalloc_array on relocation_hashtable
2025-04-18Merge tag 'linux_kselftest-kunit-fixes-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit fix from Shuah Khan: "Fixes arch sh kunit qemu_configs script sh.py to honor kunit cmdline" * tag 'linux_kselftest-kunit-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: qemu_configs: SH: Respect kunit cmdline
2025-04-18Merge tag 'linux_kselftest-fixes-6.15-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "Fixes dynevent_limitations.tc test failure on dash by detecting and handling bash and dash differences in evaluating \\" * tag 'linux_kselftest-fixes-6.15-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: Differentiate bash and dash in dynevent_limitations.tc
2025-04-18Merge tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix integer overflow in server disconnect deadtime calculation - Three fixes for potential use after frees: one for oplocks, and one for leases and one for kerberos authentication - Fix to prevent attempted write to directory - Fix locking warning for durable scavenger thread * tag 'v6.15-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: Prevent integer overflow in calculation of deadtime ksmbd: fix the warning from __kernel_write_iter ksmbd: fix use-after-free in smb_break_all_levII_oplock() ksmbd: fix use-after-free in __smb2_lease_break_noti() ksmbd: fix WARNING "do not call blocking ops when !TASK_RUNNING" ksmbd: Fix dangling pointer in krb_authenticate
2025-04-18cxl/feature: Update out_len in set feature failure caseLi Ming
CXL subsystem supports userspace to configure features via fwctl interface, it will configure features by using Set Feature command. Whatever Set Feature succeeds or fails, CXL driver always needs to return a structure fwctl_rpc_cxl_out to caller, and returned size is updated in a out_len parameter. The out_len should be updated not only when the set feature succeeds, but also when the set feature fails. Fixes: eb5dfcb9e36d ("cxl: Add support to handle user feature commands for set feature") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20250410024521.514095-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-04-18cxl: Fix devm host device for CXL fwctl initializationDave Jiang
Testing revealed the following error message for a CXL memdev that has Feature support: [ 56.690430] cxl mem0: Resources present before probing Attach the allocation of cxl_fwctl to the parent device of cxl_memdev. devm_add_* calls for cxl_memdev should not happen before the memdev probe function or outside the scope of the memdev driver. cxl_test missed this bug because cxl_test always arranges for the cxl_mem driver to be loaded before cxl_mock_mem runs. So the driver core always finds the devres list idle in that case. [DJ: Updated subject title and added commit log suggestion from djbw] Fixes: 858ce2f56b52 ("cxl: Add FWCTL support to CXL") Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/linux-cxl/6801aea053466_71fe2944c@dwillia2-xfh.jf.intel.com.notmuch/ Link: https://patch.msgid.link/20250418002933.406439-1-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-04-18Merge tag 'block-6.15-20250417' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - MD pull via Yu: - fix raid10 missing discard IO accounting (Yu Kuai) - fix bitmap stats for bitmap file (Zheng Qixing) - fix oops while reading all member disks failed during check/repair (Meir Elisha) - NVMe pull via Christoph: - fix scan failure for non-ANA multipath controllers (Hannes Reinecke) - fix multipath sysfs links creation for some cases (Hannes Reinecke) - PCIe endpoint fixes (Damien Le Moal) - use NULL instead of 0 in the auth code (Damien Le Moal) - Various ublk fixes: - Slew of selftest additions - Improvements and fixes for IO cancelation - Tweak to Kconfig verbiage - Fix for page dirtying for blk integrity mapped pages - loop fixes: - buffered IO fix - uevent fixes - request priority inheritance fix - Various little fixes * tag 'block-6.15-20250417' of git://git.kernel.dk/linux: (38 commits) selftests: ublk: add generic_06 for covering fault inject ublk: simplify aborting ublk request ublk: remove __ublk_quiesce_dev() ublk: improve detection and handling of ublk server exit ublk: move device reset into ublk_ch_release() ublk: rely on ->canceling for dealing with ublk_nosrv_dev_should_queue_io ublk: add ublk_force_abort_dev() ublk: properly serialize all FETCH_REQs selftests: ublk: move creating UBLK_TMP into _prep_test() selftests: ublk: add test_stress_05.sh selftests: ublk: support user recovery selftests: ublk: support target specific command line selftests: ublk: increase max nr_queues and queue depth selftests: ublk: set queue pthread's cpu affinity selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN selftests: ublk: add two stress tests for zero copy feature selftests: ublk: run stress tests in parallel selftests: ublk: make sure _add_ublk_dev can return in sub-shell selftests: ublk: cleanup backfile automatically selftests: ublk: add io_uring uapi header ...
2025-04-18Merge tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Correctly cap iov_iter->nr_segs for imports of registered buffers, both kbuf and normal ones. Three cleanups to make it saner first, then two fixes for each of the buffer types. This fixes a performance regression where partial buffer usage doesn't trim the tail number of segments, leading the block layer to iterate the IOs to check if it needs splitting. - Two patches tweaking the newly introduced zero-copy rx API, mostly to keep the API consistent once we add multiple interface queues per ring support in the 6.16 release. - zc rx unmapping fix for a dead device * tag 'io_uring-6.15-20250418' of git://git.kernel.dk/linux: io_uring/zcrx: fix late dma unmap for a dead dev io_uring/rsrc: ensure segments counts are correct on kbuf buffers io_uring/rsrc: send exact nr_segs for fixed buffer io_uring/rsrc: refactor io_import_fixed io_uring/rsrc: separate kbuf offset adjustments io_uring/rsrc: don't skip offset calculation io_uring/zcrx: add pp to ifq conversion helper io_uring/zcrx: return ifq id to the user
2025-04-18tracing: selftests: Add testing a user string to filtersSteven Rostedt
Running the following commands was broken: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace And would produce nothing when it should have produced something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Add a test to check this case so that it will be caught if it breaks again. Link: https://lore.kernel.org/linux-trace-kernel/20250417183003.505835fb@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/20250418101208.38dc81f5@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-18staging: iio: adc: ad7816: Correct conditional logic for store modeGabriel Shahrouzi
The mode setting logic in ad7816_store_mode was reversed due to incorrect handling of the strcmp return value. strcmp returns 0 on match, so the `if (strcmp(buf, "full"))` block executed when the input was not "full". This resulted in "full" setting the mode to AD7816_PD (power-down) and other inputs setting it to AD7816_FULL. Fix this by checking it against 0 to correctly check for "full" and "power-down", mapping them to AD7816_FULL and AD7816_PD respectively. Fixes: 7924425db04a ("staging: iio: adc: new driver for AD7816 devices") Cc: stable@vger.kernel.org Signed-off-by: Gabriel Shahrouzi <gshahrouzi@gmail.com> Acked-by: Nuno Sá <nuno.sa@analog.com> Link: https://lore.kernel.org/stable/20250414152920.467505-1-gshahrouzi%40gmail.com Link: https://patch.msgid.link/20250414154050.469482-1-gshahrouzi@gmail.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: adc: ad7266: Fix potential timestamp alignment issue.Jonathan Cameron
On architectures where an s64 is only 32-bit aligned insufficient padding would be left between the earlier elements and the timestamp. Use aligned_s64 to enforce the correct placement and ensure the storage is large enough. Fixes: 54e018da3141 ("iio:ad7266: Mark transfer buffer as __be16") # aligned_s64 is much newer. Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-2-jic23@kernel.org Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: adc: ad7768-1: Fix insufficient alignment of timestamp.Jonathan Cameron
On architectures where an s64 is not 64-bit aligned, this may result insufficient alignment of the timestamp and the structure being too small. Use aligned_s64 to force the alignment. Fixes: a1caeebab07e ("iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()") # aligned_s64 newer Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-3-jic23@kernel.org Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: adc: dln2: Use aligned_s64 for timestampJonathan Cameron
Here the lack of marking allows the overall structure to not be sufficiently aligned resulting in misplacement of the timestamp in iio_push_to_buffers_with_timestamp(). Use aligned_s64 to force the alignment on all architectures. Fixes: 7c0299e879dd ("iio: adc: Add support for DLN2 ADC") Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-4-jic23@kernel.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64Jonathan Cameron
The IIO ABI requires 64-bit aligned timestamps. In this case insufficient padding would have been added on architectures where an s64 is only 32-bit aligned. Use aligned_s64 to enforce the correct alignment. Fixes: 327a0eaf19d5 ("iio: accel: adxl355: Add triggered buffer support") Reported-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-5-jic23@kernel.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer.Jonathan Cameron
The trick of using __aligned(IIO_DMA_MINALIGN) ensures that there is no overlap between buffers used for DMA and those used for driver state storage that are before the marking. It doesn't ensure anything above state variables found after the marking. Hence move this particular bit of state earlier in the structure. Fixes: 10897f34309b ("iio: temp: maxim_thermocouple: Fix alignment for DMA safety") Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250413103443.2420727-14-jic23@kernel.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: chemical: pms7003: use aligned_s64 for timestampDavid Lechner
Follow the pattern of other drivers and use aligned_s64 for the timestamp. This will ensure that the timestamp is correctly aligned on all architectures. Also move the unaligned.h header while touching this since it was the only one not in alphabetical order. Fixes: 13e945631c2f ("iio:chemical:pms7003: Fix timestamp alignment and prevent data leak.") Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://patch.msgid.link/20250417-iio-more-timestamp-alignment-v1-4-eafac1e22318@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: chemical: sps30: use aligned_s64 for timestampDavid Lechner
Follow the pattern of other drivers and use aligned_s64 for the timestamp. This will ensure that the timestamp is correctly aligned on all architectures. Fixes: a5bf6fdd19c3 ("iio:chemical:sps30: Fix timestamp alignment") Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Nuno Sá <nuno.sa@analog.com> Link: https://patch.msgid.link/20250417-iio-more-timestamp-alignment-v1-5-eafac1e22318@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18iio: imu: inv_mpu6050: align buffer for timestampDavid Lechner
Align the buffer used with iio_push_to_buffers_with_timestamp() to ensure the s64 timestamp is aligned to 8 bytes. Fixes: 0829edc43e0a ("iio: imu: inv_mpu6050: read the full fifo when processing data") Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250417-iio-more-timestamp-alignment-v1-7-eafac1e22318@baylibre.com Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-04-18vhost-scsi: Fix vhost_scsi_send_status()Dongli Zhang
Although the support of VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 was signaled by the commit 664ed90e621c ("vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits"), vhost_scsi_send_bad_target() still assumes the response in a single descriptor. Similar issue in vhost_scsi_send_bad_target() has been fixed in previous commit. In addition, similar issue for vhost_scsi_complete_cmd_work() has been fixed by the commit 6dd88fd59da8 ("vhost-scsi: unbreak any layout for response"). Fixes: 3ca51662f818 ("vhost-scsi: Add better resource allocation failure handling") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20250403063028.16045-4-dongli.zhang@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18vhost-scsi: Fix vhost_scsi_send_bad_target()Dongli Zhang
Although the support of VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 was signaled by the commit 664ed90e621c ("vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits"), vhost_scsi_send_bad_target() still assumes the response in a single descriptor. In addition, although vhost_scsi_send_bad_target() is used by both I/O queue and control queue, the response header is always virtio_scsi_cmd_resp. It is required to use virtio_scsi_ctrl_tmf_resp or virtio_scsi_ctrl_an_resp for control queue. Fixes: 664ed90e621c ("vhost/scsi: Set VIRTIO_F_ANY_LAYOUT + VIRTIO_F_VERSION_1 feature bits") Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20250403063028.16045-3-dongli.zhang@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18vhost-scsi: protect vq->log_used with vq->mutexDongli Zhang
The vhost-scsi completion path may access vq->log_base when vq->log_used is already set to false. vhost-thread QEMU-thread vhost_scsi_complete_cmd_work() -> vhost_add_used() -> vhost_add_used_n() if (unlikely(vq->log_used)) QEMU disables vq->log_used via VHOST_SET_VRING_ADDR. mutex_lock(&vq->mutex); vq->log_used = false now! mutex_unlock(&vq->mutex); QEMU gfree(vq->log_base) log_used() -> log_write(vq->log_base) Assuming the VMM is QEMU. The vq->log_base is from QEMU userpace and can be reclaimed via gfree(). As a result, this causes invalid memory writes to QEMU userspace. The control queue path has the same issue. Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20250403063028.16045-2-dongli.zhang@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18vhost_task: fix vhost_task_create() documentationStefano Garzarella
Commit cb380909ae3b ("vhost: return task creation error instead of NULL") changed the return value of vhost_task_create(), but did not update the documentation. Reflect the change in the documentation: on an error, vhost_task_create() returns an ERR_PTR() and no longer NULL. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20250327124435.142831-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18virtio_console: fix order of fields cols and rowsMaximilian Immanuel Brandtner
According to section 5.3.6.2 (Multiport Device Operation) of the virtio spec(version 1.2) a control buffer with the event VIRTIO_CONSOLE_RESIZE is followed by a virtio_console_resize struct containing cols then rows. The kernel implements this the wrong way around (rows then cols) resulting in the two values being swapped. Signed-off-by: Maximilian Immanuel Brandtner <maxbr@linux.ibm.com> Message-Id: <20250324144300.905535-1-maxbr@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18virtio_console: fix missing byte order handling for cols and rowsHalil Pasic
As per virtio spec the fields cols and rows are specified as little endian. Although there is no legacy interface requirement that would state that cols and rows need to be handled as native endian when legacy interface is used, unlike for the fields of the adjacent struct virtio_console_control, I decided to err on the side of caution based on some non-conclusive virtio spec repo archaeology and opt for using virtio16_to_cpu() much like for virtio_console_control.event. Strictly by the letter of the spec virtio_le_to_cpu() would have been sufficient. But when the legacy interface is not used, it boils down to the same. And when using the legacy interface, the device formatting these as little endian when the guest is big endian would surprise me more than it using guest native byte order (which would make it compatible with the current implementation). Nevertheless somebody trying to implement the spec following it to the letter could end up forcing little endian byte order when the legacy interface is in use. So IMHO this ultimately needs a judgement call by the maintainers. Fixes: 8345adbf96fc1 ("virtio: console: Accept console size along with resize control message") Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Cc: stable@vger.kernel.org # v2.6.35+ Message-Id: <20250322002954.3129282-1-pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-04-18virtgpu: don't reset on shutdownMichael S. Tsirkin
It looks like GPUs are used after shutdown is invoked. Thus, breaking virtio gpu in the shutdown callback is not a good idea - guest hangs attempting to finish console drawing, with these warnings: [ 20.504464] WARNING: CPU: 0 PID: 568 at drivers/gpu/drm/virtio/virtgpu_vq.c:358 virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.505685] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink vfat fat intel_rapl_msr intel_rapl_common intel_uncore_frequency_common nfit libnvdimm kvm_intel kvm rapl iTCO_wdt iTCO_vendor_support virtio_gpu virtio_dma_buf pcspkr drm_shmem_helper i2c_i801 drm_kms_helper lpc_ich i2c_smbus virtio_balloon joydev drm fuse xfs libcrc32c ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata virtio_net ghash_clmulni_intel net_failover virtio_blk failover serio_raw dm_mirror dm_region_hash dm_log dm_mod [ 20.511847] CPU: 0 PID: 568 Comm: kworker/0:3 Kdump: loaded Tainted: G W ------- --- 5.14.0-578.6675_1757216455.el9.x86_64 #1 [ 20.513157] Hardware name: Red Hat KVM/RHEL, BIOS edk2-20241117-3.el9 11/17/2024 [ 20.513918] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper] [ 20.514626] RIP: 0010:virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.515332] Code: 00 00 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 51 50 00 00 65 ff 0d 42 e3 74 3f 0f 85 69 ff ff ff 0f 1f 44 00 00 e9 5f ff ff ff <0f> 0b e9 3f ff ff ff 48 83 3c 24 00 74 0e 49 8b 7f 40 48 85 ff 74 [ 20.517272] RSP: 0018:ff34f0a8c0787ad8 EFLAGS: 00010282 [ 20.517820] RAX: 00000000fffffffb RBX: 0000000000000000 RCX: 0000000000000820 [ 20.518565] RDX: 0000000000000000 RSI: ff34f0a8c0787be0 RDI: ff218bef03a26300 [ 20.519308] RBP: ff218bef03a26300 R08: 0000000000000001 R09: ff218bef07224360 [ 20.520059] R10: 0000000000008dc0 R11: 0000000000000002 R12: ff218bef02630028 [ 20.520806] R13: ff218bef0263fb48 R14: ff218bef00cb8000 R15: ff218bef07224360 [ 20.521555] FS: 0000000000000000(0000) GS:ff218bef7ba00000(0000) knlGS:0000000000000000 [ 20.522397] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.522996] CR2: 000055ac4f7871c0 CR3: 000000010b9f2002 CR4: 0000000000771ef0 [ 20.523740] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 20.524477] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 20.525223] PKRU: 55555554 [ 20.525515] Call Trace: [ 20.525777] <TASK> [ 20.526003] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526464] ? show_trace_log_lvl+0x1c4/0x2df [ 20.526925] ? virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.527643] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.528282] ? __warn+0x7e/0xd0 [ 20.528621] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.529256] ? report_bug+0x100/0x140 [ 20.529643] ? handle_bug+0x3c/0x70 [ 20.530010] ? exc_invalid_op+0x14/0x70 [ 20.530421] ? asm_exc_invalid_op+0x16/0x20 [ 20.530862] ? virtio_gpu_queue_ctrl_sgs+0x236/0x290 [virtio_gpu] [ 20.531506] ? virtio_gpu_queue_ctrl_sgs+0x174/0x290 [virtio_gpu] [ 20.532148] virtio_gpu_queue_fenced_ctrl_buffer+0x82/0x2c0 [virtio_gpu] [ 20.532843] virtio_gpu_primary_plane_update+0x3e2/0x460 [virtio_gpu] [ 20.533520] drm_atomic_helper_commit_planes+0x108/0x320 [drm_kms_helper] [ 20.534233] drm_atomic_helper_commit_tail+0x45/0x80 [drm_kms_helper] [ 20.534914] commit_tail+0xd2/0x130 [drm_kms_helper] [ 20.535446] drm_atomic_helper_commit+0x11b/0x140 [drm_kms_helper] [ 20.536097] drm_atomic_commit+0xa4/0xe0 [drm] [ 20.536588] ? __pfx___drm_printfn_info+0x10/0x10 [drm] [ 20.537162] drm_atomic_helper_dirtyfb+0x192/0x270 [drm_kms_helper] [ 20.537823] drm_fbdev_shmem_helper_fb_dirty+0x43/0xa0 [drm_shmem_helper] [ 20.538536] drm_fb_helper_damage_work+0x87/0x160 [drm_kms_helper] [ 20.539188] process_one_work+0x194/0x380 [ 20.539612] worker_thread+0x2fe/0x410 [ 20.540007] ? __pfx_worker_thread+0x10/0x10 [ 20.540456] kthread+0xdd/0x100 [ 20.540791] ? __pfx_kthread+0x10/0x10 [ 20.541190] ret_from_fork+0x29/0x50 [ 20.541566] </TASK> [ 20.541802] ---[ end trace 0000000000000000 ]--- It looks like the shutdown is called in the middle of console drawing, so we should either wait for it to finish, or let drm handle the shutdown. This patch implements this second option: Add an option for drivers to bypass the common break+reset handling. As DRM is careful to flush/synchronize outstanding buffers, it looks like GPU can just have a NOP there. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Fixes: 8bd2fa086a04 ("virtio: break and reset virtio devices on device_shutdown()") Cc: Eric Auger <eauger@redhat.com> Cc: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <8490dbeb6f79ed039e6c11d121002618972538a3.1744293540.git.mst@redhat.com>
2025-04-18selftests/pcie_bwctrl: Fix test progs listIlpo Järvinen
Commit df6f8c4d72ae ("selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS") added set_pcie_speed.sh into TEST_PROGS but that script is a helper that is only being called by set_pcie_cooling_state.sh, not a test case itself. When set_pcie_speed.sh is in TEST_PROGS, selftest harness will execute also it leading to bwctrl selftest errors: # selftests: pcie_bwctrl: set_pcie_speed.sh # cat: /cur_state: No such file or directory not ok 2 selftests: pcie_bwctrl: set_pcie_speed.sh # exit=1 Place set_pcie_speed.sh into TEST_FILES instead to have it included into installed test files but not execute it from the test harness. Fixes: df6f8c4d72ae ("selftests/pcie_bwctrl: Add 'set_pcie_speed.sh' to TEST_PROGS") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20250417124529.11391-1-ilpo.jarvinen@linux.intel.com
2025-04-18PCI: Restore assigned resources fully after releaseIlpo Järvinen
PCI resource fitting code in __assign_resources_sorted() runs in multiple steps. A resource that was successfully assigned may have to be released before the next step attempts assignment again. The assign+release cycle is destructive to a start-aligned struct resource (bridge window or IOV resource) because the start field is overwritten with the real address when the resource got assigned. One symptom: pci 0002:00:00.0: bridge window [mem size 0x00100000]: can't assign; bogus alignment Properly restore the resource after releasing it. The start, end, and flags fields must be stored into the related struct pci_dev_resource in order to be able to restore the resource to its original state. Fixes: 96336ec70264 ("PCI: Perform reset_resource() and build fail list in sync") Reported-by: Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/r/01eb7d40-f5b5-4ec5-b390-a5c042c30aff@roeck-us.net/ Reported-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Closes: https://lore.kernel.org/r/3578030.5fSG56mABF@workhorse Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Ondrej Jirman <megi@xff.cz> Link: https://patch.msgid.link/20250403093137.1481-1-ilpo.jarvinen@linux.intel.com
2025-04-18x86/boot/sev: Avoid shared GHCB page for early memory acceptanceArd Biesheuvel
Communicating with the hypervisor using the shared GHCB page requires clearing the C bit in the mapping of that page. When executing in the context of the EFI boot services, the page tables are owned by the firmware, and this manipulation is not possible. So switch to a different API for accepting memory in SEV-SNP guests, one which is actually supported at the point during boot where the EFI stub may need to accept memory, but the SEV-SNP init code has not executed yet. For simplicity, also switch the memory acceptance carried out by the decompressor when not booting via EFI - this only involves the allocation for the decompressed kernel, and is generally only called after kexec, as normal boot will jump straight into the kernel from the EFI stub. Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-efi@vger.kernel.org Link: https://lore.kernel.org/r/20250404082921.2767593-8-ardb+git@google.com # discussion thread #1 Link: https://lore.kernel.org/r/20250410132850.3708703-2-ardb+git@google.com # discussion thread #2 Link: https://lore.kernel.org/r/20250417202120.1002102-2-ardb+git@google.com # final submission
2025-04-18x86/cpu/amd: Fix workaround for erratum 1054Sandipan Das
Erratum 1054 affects AMD Zen processors that are a part of Family 17h Models 00-2Fh and the workaround is to not set HWCR[IRPerfEn]. However, when X86_FEATURE_ZEN1 was introduced, the condition to detect unaffected processors was incorrectly changed in a way that the IRPerfEn bit gets set only for unaffected Zen 1 processors. Ensure that HWCR[IRPerfEn] is set for all unaffected processors. This includes a subset of Zen 1 (Family 17h Models 30h and above) and all later processors. Also clear X86_FEATURE_IRPERF on affected processors so that the IRPerfCount register is not used by other entities like the MSR PMU driver. Fixes: 232afb557835 ("x86/CPU/AMD: Add X86_FEATURE_ZEN1") Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Borislav Petkov <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/caa057a9d6f8ad579e2f1abaa71efbd5bd4eaf6d.1744956467.git.sandipan.das@amd.com
2025-04-18io_uring/zcrx: fix late dma unmap for a dead devPavel Begunkov
There is a problem with page pools not dma-unmapping immediately when the device is going down, and delaying it until the page pool is destroyed, which is not allowed (see links). That just got fixed for normal page pools, and we need to address memory providers as well. Unmap pages in the memory provider uninstall callback, and protect it with a new lock. There is also a gap between when a dma mapping is created and the mp is installed, so if the device is killed in between, io_uring would be holding on to dma mappings to a dead device with no one to call ->uninstall. Move it to page pool init and rely on ->is_mapped to make sure it's only done once. Link: https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/ Link: https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/ Fixes: 34a3e60821ab9 ("io_uring/zcrx: implement zerocopy receive pp memory provider") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ef9b7db249b14f6e0b570a1bb77ff177389f881c.1744965853.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-04-18Merge tag 'usb-serial-6.15-rc3' of ↵Greg Kroah-Hartman
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus Johan writes: USB-serial device ids for 6.15-rc3 Here's a new simple driver for Owon oscilloscopes and a couple of new new modem and smart meter device ids. All have been in linux-next with no reported issues. * tag 'usb-serial-6.15-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial: USB: serial: simple: add OWON HDS200 series oscilloscope support USB: serial: ftdi_sio: add support for Abacus Electrics Optical Probe USB: serial: option: add Sierra Wireless EM9291
2025-04-17MAINTAINERS: add section for locking of mm's and VMAsLorenzo Stoakes
We place this under memory mapping as related to memory mapping abstractions in the form of mm_struct and vm_area_struct (VMA). Now we have separated out mmap/vma locking logic into the mmap_lock.c and mmap_lock.h files, so this should encapsulate the majority of the mm locking logic in the kernel. Suren is best placed to maintain this logic as the core architect of VMA locking as a whole. Link: https://lkml.kernel.org/r/e6ed679a184ca444b20dfa77af96913fd8b5efa0.1744799282.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: vmscan: fix kswapd exit condition in defrag_modeJohannes Weiner
Vlastimil points out an issue with kswapd in defrag_mode not waking up kcompactd reliably. Background: When kswapd is woken for any higher-order request, it initially checks those high-order watermarks to decide if work is necesary. However, it cannot (efficiently) meet the contiguity goal of such a request by itself. So once it has reclaimed a compaction gap, it adjusts the request down to check for free order-0 pages, then wakes kcompactd to coalesce them into larger blocks. In defrag_mode, the initial watermark check needs to be analogously against free pageblocks. However, once kswapd drops the high-order to hand off contiguity work, it also needs to fall back to base page watermarks - otherwise it'll keep reclaiming until blocks are freed. While it appears kcompactd is woken up frequently enough to do most of the compaction work, kswapd ends up overreclaiming by quite a bit: DEFRAGMODE DEFRAGMODE-thispatch Hugealloc Time mean 79381.34 ( +0.00%) 88126.12 ( +11.02%) Hugealloc Time stddev 85852.16 ( +0.00%) 135366.75 ( +57.67%) Kbuild Real time 249.35 ( +0.00%) 226.71 ( -9.04%) Kbuild User time 1249.16 ( +0.00%) 1249.37 ( +0.02%) Kbuild System time 171.76 ( +0.00%) 166.93 ( -2.79%) THP fault alloc 51666.87 ( +0.00%) 52685.60 ( +1.97%) THP fault fallback 16970.00 ( +0.00%) 15951.87 ( -6.00%) Direct compact fail 166.53 ( +0.00%) 178.93 ( +7.40%) Direct compact success 17.13 ( +0.00%) 4.13 ( -71.69%) Compact daemon scanned migrate 3095413.33 ( +0.00%) 9231239.53 ( +198.22%) Compact daemon scanned free 2155966.53 ( +0.00%) 7053692.87 ( +227.17%) Compact direct scanned migrate 265642.47 ( +0.00%) 68388.33 ( -74.26%) Compact direct scanned free 130252.60 ( +0.00%) 55634.87 ( -57.29%) Compact total migrate scanned 3361055.80 ( +0.00%) 9299627.87 ( +176.69%) Compact total free scanned 2286219.13 ( +0.00%) 7109327.73 ( +210.96%) Alloc stall 1890.80 ( +0.00%) 6297.60 ( +232.94%) Pages kswapd scanned 9043558.80 ( +0.00%) 5952576.73 ( -34.18%) Pages kswapd reclaimed 1891708.67 ( +0.00%) 1030645.00 ( -45.52%) Pages direct scanned 1017090.60 ( +0.00%) 2688047.60 ( +164.29%) Pages direct reclaimed 92682.60 ( +0.00%) 309770.53 ( +234.22%) Pages total scanned 10060649.40 ( +0.00%) 8640624.33 ( -14.11%) Pages total reclaimed 1984391.27 ( +0.00%) 1340415.53 ( -32.45%) Swap out 884585.73 ( +0.00%) 417781.93 ( -52.77%) Swap in 287106.27 ( +0.00%) 95589.73 ( -66.71%) File refaults 551697.60 ( +0.00%) 426474.80 ( -22.70%) Link: https://lkml.kernel.org/r/20250416135142.778933-3-hannes@cmpxchg.org Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: vmscan: restore high-cpu watermark safety in kswapdJohannes Weiner
Vlastimil points out that commit a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") switched kswapd from zone_watermark_ok_safe() to the standard, percpu-cached version of reading free pages, thus dropping the watermark safety precautions for systems with high CPU counts (e.g. >212 cpus on 64G). Restore them. Since zone_watermark_ok_safe() is no longer the right interface, and this was the last caller of the function anyway, open-code the zone_page_state_snapshot() conditional and delete the function. Link: https://lkml.kernel.org/r/20250416135142.778933-2-hannes@cmpxchg.org Fixes: a211c6550efc ("mm: page_alloc: defrag_mode kswapd/kcompactd watermarks") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add Pedro as reviewer to the MEMORY MAPPING sectionLorenzo Stoakes
Pedro has offered to review memory mapping code. He has good experience in this area and has provided excellent feedback on memory mapping series in the past so I feel he'll be a great addition. Link: https://lkml.kernel.org/r/20250416135301.43513-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/memory: move sanity checks in do_wp_page() after mapcount vs. refcount ↵David Hildenbrand
stabilization In __folio_remove_rmap() for RMAP_LEVEL_PMD/RMAP_LEVEL_PUD and with CONFIG_PAGE_MAPCOUNT we first decrement the folio mapcount (and recompute mapped shared vs. mapped exclusively) to then adjust the entire mapcount. This means that another process might stumble in do_wp_page() over a PTE-mapped PMD folio that is indicated as "exclusively mapped", but still has an entire mapcount (PMD mapping), because it is racing with the process that is unmapping the folio (PMD mapping). Note that do_wp_page() will back off once it detects the remaining folio reference from the process that is in the process of unmapping the folio. This will trigger the early VM_WARN_ON_ONCE(folio_entire_mapcount(folio)) check in do_wp_page(), that can easily be reproduced by looping a couple of times over allocating a PMD THP, forking a child where we immediately unmap it again, and writing in the parent concurrently to the THP. [ 252.738129][T16470] ------------[ cut here ]------------ [ 252.739267][T16470] WARNING: CPU: 3 PID: 16470 at mm/memory.c:3738 do_wp_page+0x2a75/0x2c00 [ 252.740968][T16470] Modules linked in: [ 252.741958][T16470] CPU: 3 UID: 0 PID: 16470 Comm: ... ... [ 252.765841][T16470] <TASK> [ 252.766419][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.767558][T16470] ? rcu_is_watching+0x12/0x60 [ 252.768525][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.769645][T16470] ? srso_alias_return_thunk+0x5/0xfbef5 [ 252.770778][T16470] ? lock_acquire+0x33/0x80 [ 252.771697][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.772735][T16470] ? __handle_mm_fault+0x5e8/0x3e40 [ 252.773781][T16470] __handle_mm_fault+0x1869/0x3e40 [ 252.774839][T16470] handle_mm_fault+0x22a/0x640 [ 252.775808][T16470] do_user_addr_fault+0x618/0x1000 [ 252.776847][T16470] exc_page_fault+0x68/0xd0 [ 252.777775][T16470] asm_exc_page_fault+0x26/0x30 While we could adjust the sequence in __folio_remove_rmap(), let's rater move the mapcount sanity checks after the mapcount vs. refcount stabilization phase. With this fix, a simple reproducer is happy. While at it, convert the two VM_WARN_ON_ONCE() we are moving to VM_WARN_ON_ONCE_FOLIO(). Link: https://lkml.kernel.org/r/20250415095007.569836-1-david@redhat.com Fixes: 1da190f4d0a6 ("mm: Copy-on-Write (COW) reuse support for PTE-mapped THP") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: syzbot+5e8feb543ca8e12e0ede@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/67fab4fe.050a0220.2c5fcf.0011.GAE@google.com Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm, hugetlb: increment the number of pages to be reset on HVOOscar Salvador
commit 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") shifted hugetlb specific stuff, and now mapping overlaps _hugetlb_cgroup field. Upon restoring the vmemmap for HVO, only the first two tail pages are reset, and this causes the check in free_tail_page_prepare() to fail as it finds an unexpected mapping value in some tails. Increment the number of pages to be reset to 4 (head + 3 tail pages) Link: https://lkml.kernel.org/r/20250415111859.376302-1-osalvador@suse.de Fixes: 4eeec8c89a0c ("mm: move hugetlb specific things in folio to page[3]") Signed-off-by: Oscar Salvador <osalvador@suse.de> Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17writeback: fix false warning in inode_to_wb()Andreas Gruenbacher
inode_to_wb() is used also for filesystems that don't support cgroup writeback. For these filesystems inode->i_wb is stable during the lifetime of the inode (it points to bdi->wb) and there's no need to hold locks protecting the inode->i_wb dereference. Improve the warning in inode_to_wb() to not trigger for these filesystems. Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17docs: ABI: replace mcroce@microsoft.com with new Meta addressAhmad Fatoum
The Microsoft email address is bouncing: 550 5.4.1 Recipient address rejected: Access denied. So let's replace it with Matteo's current mail address. Link: https://lkml.kernel.org/r/20250414-fix-mcroce-mail-bounce-v3-1-0aed2d71f3d7@pengutronix.de Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Matteo Croce <teknoraver@meta.com> Link: https://lore.kernel.org/all/BYAPR15MB2504E4B02DFFB1E55871955DA1062@BYAPR15MB2504.namprd15.prod.outlook.com/ Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matteo Croce <teknoraver@meta.com> Cc: Sascha Hauer <kernel@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()Baoquan He
Not like fault_in_readable() or fault_in_writeable(), in fault_in_safe_writeable() local variable 'start' is increased page by page to loop till the whole address range is handled. However, it mistakenly calculates the size of the handled range with 'uaddr - start'. Fix it here. Andreas said: : In gfs2, fault_in_iov_iter_writeable() is used in : gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially : affects buffered as well as direct reads. This bug could cause those : gfs2 functions to spin in a loop. Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com Signed-off-by: Baoquan He <bhe@redhat.com> Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()") Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Yanjun.Zhu <yanjun.zhu@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add memory advice sectionLorenzo Stoakes
The madvise code straddles both VMA and page table manipulation. As a result, separate it out into its own section and add maintainers/reviewers as appropriate. We additionally include the mman-common.h file as this contains the shared madvise flags and it is important we maintain this alongside madvise.c. Link: https://lkml.kernel.org/r/20250411072724.10841-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Jann Horn <jannh@google.com> Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add mmap trace events to MEMORY MAPPINGLiam R. Howlett
MEMORY MAPPING does not list the mmap.h trace point file, but does list the mmap.c file. Couple the trace points with the users and authors of the trace points for notifications of updates. Link: https://lkml.kernel.org/r/20250411173328.8172-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm: memcontrol: fix swap counter leak from offline cgroupMuchun Song
commit 73f839b6d2ed addressed an issue regarding the swap counter leak that occurred from an offline cgroup. However, commit 89ce924f0bd4 modified the parameter from @swap_memcg to @memcg (presumably this alteration was introduced while resolving conflicts). Fix this problem by reverting this minor change. Link: https://lkml.kernel.org/r/20250410081812.10073-1-songmuchun@bytedance.com Fixes: 89ce924f0bd4 ("mm: memcontrol: move memsw charge callbacks to v1") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: add MM subsection for the page allocatorVlastimil Babka
Add a subsection for the page allocator, including compaction as it's crucial for high-order allocations and works together with the anti-fragmentation features. Add reviewers (including myself) who voluteered. Link: https://lkml.kernel.org/r/20250410090021.72296-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Brendan Jackman <jackmanb@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Christoph Lameter (Ampere) <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Harry Yoo <harry.yoo@oracle.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17MAINTAINERS: update SLAB ALLOCATOR maintainersVlastimil Babka
With permission, reduce the number of maintainers. Create a CREDITS entry for Joonsoo (Pekka already has one). Thanks for all the work! Link: https://lkml.kernel.org/r/20250410090021.72296-3-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Christoph Lameter (Ampere) <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17fs/dax: fix folio splitting issue by resetting old folio order + _nr_pagesDavid Hildenbrand
Alison reports an issue with fsdax when large extends end up using large ZONE_DEVICE folios: [ 417.796271] BUG: kernel NULL pointer dereference, address: 0000000000000b00 [ 417.796982] #PF: supervisor read access in kernel mode [ 417.797540] #PF: error_code(0x0000) - not-present page [ 417.798123] PGD 2a5c5067 P4D 2a5c5067 PUD 2a5c6067 PMD 0 [ 417.798690] Oops: Oops: 0000 [#1] SMP NOPTI [ 417.799178] CPU: 5 UID: 0 PID: 1515 Comm: mmap Tainted: ... [ 417.800150] Tainted: [O]=OOT_MODULE [ 417.800583] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 417.801358] RIP: 0010:__lruvec_stat_mod_folio+0x7e/0x250 [ 417.801948] Code: ... [ 417.803662] RSP: 0000:ffffc90002be3a08 EFLAGS: 00010206 [ 417.804234] RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000002 [ 417.804984] RDX: ffffffff815652d7 RSI: 0000000000000000 RDI: ffffffff82a2beae [ 417.805689] RBP: ffffc90002be3a28 R08: 0000000000000000 R09: 0000000000000000 [ 417.806384] R10: ffffea0007000040 R11: ffff888376ffe000 R12: 0000000000000001 [ 417.807099] R13: 0000000000000012 R14: ffff88807fe4ab40 R15: ffff888029210580 [ 417.807801] FS: 00007f339fa7a740(0000) GS:ffff8881fa9b9000(0000) knlGS:0000000000000000 [ 417.808570] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 417.809193] CR2: 0000000000000b00 CR3: 000000002a4f0004 CR4: 0000000000370ef0 [ 417.809925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 417.810622] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 417.811353] Call Trace: [ 417.811709] <TASK> [ 417.812038] folio_add_file_rmap_ptes+0x143/0x230 [ 417.812566] insert_page_into_pte_locked+0x1ee/0x3c0 [ 417.813132] insert_page+0x78/0xf0 [ 417.813558] vmf_insert_page_mkwrite+0x55/0xa0 [ 417.814088] dax_fault_iter+0x484/0x7b0 [ 417.814542] dax_iomap_pte_fault+0x1ca/0x620 [ 417.815055] dax_iomap_fault+0x39/0x40 [ 417.815499] __xfs_write_fault+0x139/0x380 [ 417.815995] ? __handle_mm_fault+0x5e5/0x1a60 [ 417.816483] xfs_write_fault+0x41/0x50 [ 417.816966] xfs_filemap_fault+0x3b/0xe0 [ 417.817424] __do_fault+0x31/0x180 [ 417.817859] __handle_mm_fault+0xee1/0x1a60 [ 417.818325] ? debug_smp_processor_id+0x17/0x20 [ 417.818844] handle_mm_fault+0xe1/0x2b0 [...] The issue is that when we split a large ZONE_DEVICE folio to order-0 ones, we don't reset the order/_nr_pages. As folio->_nr_pages overlays page[1]->memcg_data, once page[1] is a folio, it suddenly looks like it has folio->memcg_data set. And we never manually initialize folio->memcg_data in fsdax code, because we never expect it to be set at all. When __lruvec_stat_mod_folio() then stumbles over such a folio, it tries to use folio->memcg_data (because it's non-NULL) but it does not actually point at a memcg, resulting in the problem. Alison also observed that these folios sometimes have "locked" set, which is rather concerning (folios locked from the beginning ...). The reason is that the order for large folios is stored in page[1]->flags, which become the folio->flags of a new small folio. Let's fix it by adding a folio helper to clear order/_nr_pages for splitting purposes. Maybe we should reinitialize other large folio flags / folio members as well when splitting, because they might similarly cause harm once page[1] becomes a folio? At least other flags in PAGE_FLAGS_SECOND should not be set for fsdax, so at least page[1]->flags might be as expected with this fix. From a quick glimpse, initializing ->mapping, ->pgmap and ->share should re-initialize most things from a previous page[1] used by large folios that fsdax cares about. For example folio->private might not get reinitialized, but maybe that's not relevant -- no traces of it's use in fsdax code. Needs a closer look. Another thing that should be considered in the future is performing similar checks as we perform in free_tail_page_prepare() -- checking pincount etc. -- when freeing a large fsdax folio. Link: https://lkml.kernel.org/r/20250410091020.119116-1-david@redhat.com Fixes: 4996fc547f5b ("mm: let _folio_nr_pages overlay memcg_data in first tail page") Fixes: 38607c62b34b ("fs/dax: properly refcount fs dax pages") Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Alison Schofield <alison.schofield@intel.com> Closes: https://lkml.kernel.org/r/Z_W9Oeg-D9FhImf3@aschofie-mobl2.lan Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: "Darrick J. Wong" <djwong@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()Kirill A. Shutemov
When the last page in the zone is accepted, __accept_page() calls static_branch_dec(). This function takes cpu_hotplug_lock, which can lead to a deadlock if the allocation occurs during CPU bringup path as _cpu_up() also takes the lock. To prevent this deadlock, defer static_branch_dec() to a workqueue. Call static_branch_dec() only when the workqueue is not yet initialized. Workqueues are initialized before CPU bring up, so this will not conflict with the first scenario. Link: https://lkml.kernel.org/r/20250329171030.3942298-1-kirill.shutemov@linux.intel.com Fixes: 55ad43e8ba0f ("mm: add a helper to accept page") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Srikanth Aithal <sraithal@amd.com> Tested-by: Srikanth Aithal <sraithal@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-17tracing: Fix filter string testingSteven Rostedt
The filter string testing uses strncpy_from_kernel/user_nofault() to retrieve the string to test the filter against. The if() statement was incorrect as it considered 0 as a fault, when it is only negative that it faulted. Running the following commands: # cd /sys/kernel/tracing # echo "filename.ustring ~ \"/proc*\"" > events/syscalls/sys_enter_openat/filter # echo 1 > events/syscalls/sys_enter_openat/enable # ls /proc/$$/maps # cat trace Would produce nothing, but with the fix it will produce something like: ls-1192 [007] ..... 8169.828333: sys_openat(dfd: ffffffffffffff9c, filename: 7efc18359904, flags: 80000, mode: 0) Link: https://lore.kernel.org/all/CAEf4BzbVPQ=BjWztmEwBPRKHUwNfKBkS3kce-Rzka6zvbQeVpg@mail.gmail.com/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250417183003.505835fb@gandalf.local.home Fixes: 77360f9bbc7e5 ("tracing: Add test for user space strings when filtering on string pointers") Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Reported-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-04-17MAINTAINERS: Add entry for Socfpga DWMAC ethernet glue driverMaxime Chevallier
Socfpga's DWMAC glue comes in a variety of flavours with multiple options when it comes to physical interfaces, making it not so easy to test. Having access to a Cyclone5 with RGMII as well as Lynx PCS variants, add myself as a maintainer to help with reviews and testing. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20250416125453.306029-1-maxime.chevallier@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>