summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-19fanotify: limit reporting of event with non-decodeable file handlesAmir Goldstein
Commit a95aef69a740 ("fanotify: support reporting non-decodeable file handles") merged in v6.5-rc1, added the ability to use an fanotify group with FAN_REPORT_FID mode to watch filesystems that do not support nfs export, but do know how to encode non-decodeable file handles, with the newly introduced AT_HANDLE_FID flag. At the time that this commit was merged, there were no filesystems in-tree with those traits. Commit 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles"), merged in v6.6-rc1, added this trait to overlayfs, thus allowing fanotify watching of overlayfs with FAN_REPORT_FID mode. In retrospect, allowing an fanotify filesystem/mount mark on such filesystem in FAN_REPORT_FID mode will result in getting events with file handles, without the ability to resolve the filesystem objects from those file handles (i.e. no open_by_handle_at() support). For v6.6, the safer option would be to allow this mode for inode marks only, where the caller has the opportunity to use name_to_handle_at() at the time of setting the mark. In the future we can revise this decision. Fixes: a95aef69a740 ("fanotify: support reporting non-decodeable file handles") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20231018100000.2453965-2-amir73il@gmail.com>
2023-10-19Merge branch 'net-fix-bugs-in-device-netns-move-and-rename'Paolo Abeni
Jakub Kicinski says: ==================== net: fix bugs in device netns-move and rename Daniel reported issues with the uevents generated during netdev namespace move, if the netdev is getting renamed at the same time. While the issue that he actually cares about is not fixed here, there is a bunch of seemingly obvious other bugs in this code. Fix the purely networking bugs while the discussion around the uevent fix is still ongoing. ==================== Link: https://lore.kernel.org/r/20231018013817.2391509-1-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19selftests: net: add very basic test for netdev names and namespacesJakub Kicinski
Add selftest for fixes around naming netdevs and namespaces. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: move altnames together with the netdeviceJakub Kicinski
The altname nodes are currently not moved to the new netns when netdevice itself moves: [ ~]# ip netns add test [ ~]# ip -netns test link add name eth0 type dummy [ ~]# ip -netns test link property add dev eth0 altname some-name [ ~]# ip -netns test link show dev some-name 2: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 1e:67:ed:19:3d:24 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip -netns test link set dev eth0 netns 1 [ ~]# ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff altname some-name [ ~]# ip li show dev some-name Device "some-name" does not exist. Remove them from the hash table when device is unlisted and add back when listed again. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: avoid UAF on deleted altnameJakub Kicinski
Altnames are accessed under RCU (dev_get_by_name_rcu()) but freed by kfree() with no synchronization point. Each node has one or two allocations (node and a variable-size name, sometimes the name is netdev->name). Adding rcu_heads here is a bit tedious. Besides most code which unlists the names already has rcu barriers - so take the simpler approach of adding synchronize_rcu(). Note that the one on the unregistration path (which matters more) is removed by the next fix. Fixes: ff92741270bf ("net: introduce name_node struct to be used in hashlist") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: check for altname conflicts when changing netdev's netnsJakub Kicinski
It's currently possible to create an altname conflicting with an altname or real name of another device by creating it in another netns and moving it over: [ ~]$ ip link add dev eth0 type dummy [ ~]$ ip netns add test [ ~]$ ip -netns test link add dev ethX netns test type dummy [ ~]$ ip -netns test link property add dev ethX altname eth0 [ ~]$ ip -netns test link set dev ethX netns 1 [ ~]$ ip link ... 3: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:40:88:62:ec:b8 brd ff:ff:ff:ff:ff:ff ... 5: ethX: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:b7:28:78:38:0f brd ff:ff:ff:ff:ff:ff altname eth0 Create a macro for walking the altnames, this hopefully makes it clearer that the list we walk contains only altnames. Which is otherwise not entirely intuitive. Fixes: 36fbf1e52bd3 ("net: rtnetlink: add linkprop commands to add and delete alternative ifnames") Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19net: fix ifname in netlink ntf during netns moveJakub Kicinski
dev_get_valid_name() overwrites the netdev's name on success. This makes it hard to use in prepare-commit-like fashion, where we do validation first, and "commit" to the change later. Factor out a helper which lets us save the new name to a buffer. Use it to fix the problem of notification on netns move having incorrect name: 5: eth0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff 6: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether 1e:4a:34:36:e3:cd brd ff:ff:ff:ff:ff:ff [ ~]# ip link set dev eth0 netns 1 name eth1 ip monitor inside netns: Deleted inet eth0 Deleted inet6 eth0 Deleted 5: eth1: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default link/ether be:4d:58:f9:d5:40 brd ff:ff:ff:ff:ff:ff new-netnsid 0 new-ifindex 7 Name is reported as eth1 in old netns for ifindex 5, already renamed. Fixes: d90310243fd7 ("net: device name allocation cleanups") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19drm/amdgpu: ignore duplicate BOs againChristian König
Looks like RADV is actually hitting this. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: ca6c1e210aa7 ("drm/amdgpu: use the new drm_exec object for CS v3") Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017121015.1336786-1-christian.koenig@amd.com
2023-10-19net: ethernet: ti: Fix mixed module-builtin objectMD Danish Anwar
With CONFIG_TI_K3_AM65_CPSW_NUSS=y and CONFIG_TI_ICSSG_PRUETH=m, k3-cppi-desc-pool.o is linked to a module and also to vmlinux even though the expected CFLAGS are different between builtins and modules. The build system is complaining about the following: k3-cppi-desc-pool.o is added to multiple modules: icssg-prueth ti-am65-cpsw-nuss Introduce the new module, k3-cppi-desc-pool, to provide the common functions to ti-am65-cpsw-nuss and icssg-prueth. Fixes: 128d5874c082 ("net: ti: icssg-prueth: Add ICSSG ethernet driver") Signed-off-by: MD Danish Anwar <danishanwar@ti.com> Link: https://lore.kernel.org/r/20231018064936.3146846-1-danishanwar@ti.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-10-19Revert "pinctrl: avoid unsafe code pattern in find_pinctrl()"Andy Shevchenko
The commit breaks MMC enumeration on the Intel Merrifield plaform. Before: [ 36.439057] mmc0: SDHCI controller on PCI [0000:00:01.0] using ADMA [ 36.450924] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA [ 36.459355] mmc1: SDHCI controller on PCI [0000:00:01.2] using ADMA [ 36.706399] mmc0: new DDR MMC card at address 0001 [ 37.058972] mmc2: new ultra high speed DDR50 SDIO card at address 0001 [ 37.278977] mmcblk0: mmc0:0001 H4G1d 3.64 GiB [ 37.297300] mmcblk0: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 After: [ 36.436704] mmc2: SDHCI controller on PCI [0000:00:01.3] using ADMA [ 36.436720] mmc1: SDHCI controller on PCI [0000:00:01.0] using ADMA [ 36.463685] mmc0: SDHCI controller on PCI [0000:00:01.2] using ADMA [ 36.720627] mmc1: new DDR MMC card at address 0001 [ 37.068181] mmc2: new ultra high speed DDR50 SDIO card at address 0001 [ 37.279998] mmcblk1: mmc1:0001 H4G1d 3.64 GiB [ 37.302670] mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 This reverts commit c153a4edff6ab01370fcac8e46f9c89cca1060c2. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20231017141806.535191-1-andriy.shevchenko@linux.intel.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2023-10-19perf: Disallow mis-matched inherited group readsPeter Zijlstra
Because group consistency is non-atomic between parent (filedesc) and children (inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum non-matching counter groups -- with non-sensical results. Add group_generation to distinguish the case where a parent group removes and adds an event and thus has the same number, but a different configuration of events as inherited groups. This became a problem when commit fa8c269353d5 ("perf/core: Invert perf_read_group() loops") flipped the order of child_list and sibling_list. Previously it would iterate the group (sibling_list) first, and for each sibling traverse the child_list. In this order, only the group composition of the parent is relevant. By flipping the order the group composition of the child (inherited) events becomes an issue and the mis-match in group composition becomes evident. That said; even prior to this commit, while reading of a group that is not equally inherited was not broken, it still made no sense. (Ab)use ECHILD as error return to indicate issues with child process group composition. Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops") Reported-by: Budimir Markovic <markovicbudimir@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20231018115654.GK33217@noisy.programming.kicks-ass.net
2023-10-19accel/ivpu: Extend address range for MMU mmapWludzik, Jozef
Allow to use whole address range in MMU context mmap which is up to 48 bits. Return invalid argument from MMU context mmap in case address is not aligned to MMU page size, address is below MMU page size or address is greater then 47 bits. This fixes problem disallowing to run large models on VPU4 Signed-off-by: Wludzik, Jozef <jozef.wludzik@intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231018110113.547208-1-stanislaw.gruszka@linux.intel.com
2023-10-19Revert "accel/ivpu: Use cached buffers for FW loading"Stanislaw Gruszka
This reverts commit 645d694559cab36fe6a57c717efcfa27d9321396. The commit cause issues with memory access from the device side. Switch back to write-combined memory mappings until the issues will be properly addressed. Add extra wmb() needed when boot_params->save_restore_ret_address() is modified. Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231017121353.532466-1-stanislaw.gruszka@linux.intel.com
2023-10-19accel/ivpu: Don't enter d0i3 during FLRJacek Lawrynowicz
Avoid HW bug on some platforms where we enter D0i3 state and CPU is in low power states (C8 or above). Fixes: 852be13f3bd3 ("accel/ivpu: Add PM support") Cc: stable@vger.kernel.org Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231003064213.1527327-1-stanislaw.gruszka@linux.intel.com
2023-10-18Merge tag 'nf-23-10-18' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net First patch, from Phil Sutter, reduces number of audit notifications when userspace requests to re-set stateful objects. This change also comes with a selftest update. Second patch, also from Phil, moves the nftables audit selftest to its own netns to avoid interference with the init netns. Third patch, from Pablo Neira, fixes an inconsistency with the "rbtree" set backend: When set element X has expired, a request to delete element X should fail (like with all other backends). Finally, patch four, also from Pablo, reverts a recent attempt to speed up abort of a large pending update with the "pipapo" set backend. It could cause stray references to remain in the set, which then results in a double-free. * tag 'nf-23-10-18' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: revert do not remove elements if set backend implements .abort netfilter: nft_set_rbtree: .deactivate fails if element has expired selftests: netfilter: Run nft_audit.sh in its own netns netfilter: nf_tables: audit log object reset once per table ==================== Link: https://lore.kernel.org/r/20231018125605.27299-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18Merge tag 'wireless-2023-10-18' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== A few more fixes: * prevent value bounce/glitch in rfkill GPIO probe * fix lockdep report in rfkill * fix error path leak in mac80211 key handling * use system_unbound_wq for wiphy work since it can take longer * tag 'wireless-2023-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: net: rfkill: reduce data->mtx scope in rfkill_fop_open net: rfkill: gpio: prevent value glitch during probe wifi: mac80211: fix error path key leak wifi: cfg80211: use system_unbound_wq for wiphy work ==================== Link: https://lore.kernel.org/r/20231018071041.8175-2-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net: phy: bcm7xxx: Add missing 16nm EPHY statisticsFlorian Fainelli
The .probe() function would allocate the necessary space and ensure that the library call sizes the number of statistics but the callbacks necessary to fetch the name and values were not wired up. Reported-by: Justin Chen <justin.chen@broadcom.com> Fixes: f68d08c437f9 ("net: phy: bcm7xxx: Add EPHY entry for 72165") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231017205119.416392-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddrEric Dumazet
syzbot reported a data-race while accessing nh->nh_saddr_genid [1] Add annotations, but leave the code lazy as intended. [1] BUG: KCSAN: data-race in fib_select_path / fib_select_path write to 0xffff8881387166f0 of 4 bytes by task 6778 on cpu 1: fib_info_update_nhc_saddr net/ipv4/fib_semantics.c:1334 [inline] fib_result_prefsrc net/ipv4/fib_semantics.c:1354 [inline] fib_select_path+0x292/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 read to 0xffff8881387166f0 of 4 bytes by task 6759 on cpu 0: fib_result_prefsrc net/ipv4/fib_semantics.c:1350 [inline] fib_select_path+0x1cb/0x330 net/ipv4/fib_semantics.c:2269 ip_route_output_key_hash_rcu+0x659/0x12c0 net/ipv4/route.c:2810 ip_route_output_key_hash net/ipv4/route.c:2644 [inline] __ip_route_output_key include/net/route.h:134 [inline] ip_route_output_flow+0xa6/0x150 net/ipv4/route.c:2872 send4+0x1f5/0x520 drivers/net/wireguard/socket.c:61 wg_socket_send_skb_to_peer+0x94/0x130 drivers/net/wireguard/socket.c:175 wg_socket_send_buffer_to_peer+0xd6/0x100 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation drivers/net/wireguard/send.c:40 [inline] wg_packet_handshake_send_worker+0x10c/0x150 drivers/net/wireguard/send.c:51 process_one_work kernel/workqueue.c:2630 [inline] process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2703 worker_thread+0x525/0x730 kernel/workqueue.c:2784 kthread+0x1d7/0x210 kernel/kthread.c:388 ret_from_fork+0x48/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304 value changed: 0x959d3217 -> 0x959d3218 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6759 Comm: kworker/u4:15 Not tainted 6.6.0-rc4-syzkaller-00029-gcbf3a2cb156a #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/06/2023 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker Fixes: 436c3b66ec98 ("ipv4: Invalidate nexthop cache nh_saddr more correctly.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231017192304.82626-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18tcp_bpf: properly release resources on error pathsPaolo Abeni
In the blamed commit below, I completely forgot to release the acquired resources before erroring out in the TCP BPF code, as reported by Dan. Address the issues by replacing the bogus return with a jump to the relevant cleanup code. Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/8f99194c698bcef12666f0a9a999c58f8b1cb52c.1697557782.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curvePedro Tammela
Christian Theune says: I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script, leaving me with a non-functional uplink on a remote router. A 'rt' curve cannot be used as a inner curve (parent class), but we were allowing such configurations since the qdisc was introduced. Such configurations would trigger a UAF as Budimir explains: The parent will have vttree_insert() called on it in init_vf(), but will not have vttree_remove() called on it in update_vf() because it does not have the HFSC_FSC flag set. The qdisc always assumes that inner classes have the HFSC_FSC flag set. This is by design as it doesn't make sense 'qdisc wise' for an 'rt' curve to be an inner curve. Budimir's original patch disallows users to add classes with a 'rt' parent, but this is too strict as it breaks users that have been using 'rt' as a inner class. Another approach, taken by this patch, is to upgrade the inner 'rt' into a 'sc', warning the user in the process. It avoids the UAF reported by Budimir while also being more permissive to bad scripts/users/code using 'rt' as a inner class. Users checking the `tc class ls [...]` or `tc class get [...]` dumps would observe the curve change and are potentially breaking with this change. v1->v2: https://lore.kernel.org/all/20231013151057.2611860-1-pctammela@mojatatu.com/ - Correct 'Fixes' tag and merge with revert (Jakub) Cc: Christian Theune <ct@flyingcircus.io> Cc: Budimir Markovic <markovicbudimir@gmail.com> Fixes: b3d26c5702c7 ("net/sched: sch_hfsc: Ensure inner classes have fsc curve") Signed-off-by: Pedro Tammela <pctammela@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20231017143602.3191556-1-pctammela@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18net: mdio-mux: fix C45 access returning -EIO after API changeVladimir Oltean
The mii_bus API conversion to read_c45() and write_c45() did not cover the mdio-mux driver before read() and write() were made C22-only. This broke arch/arm64/boot/dts/freescale/fsl-ls1028a-qds-13bb.dtso. The -EOPNOTSUPP from mdiobus_c45_read() is transformed by get_phy_c45_devs_in_pkg() into -EIO, is further propagated to of_mdiobus_register() and this makes the mdio-mux driver fail to probe the entire child buses, not just the PHYs that cause access errors. Fix the regression by introducing special c45 read and write accessors to mdio-mux which forward the operation to the parent MDIO bus. Fixes: db1a63aed89c ("net: phy: Remove fallback to old C45 method") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20231017143144.3212657-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skbEric Dumazet
In commit 75eefc6c59fd ("tcp: tsq: add a shortcut in tcp_small_queue_check()") we allowed to send an skb regardless of TSQ limits being hit if rtx queue was empty or had a single skb, in order to better fill the pipe when/if TX completions were slow. Then later, commit 75c119afe14f ("tcp: implement rb-tree based retransmit queue") accidentally removed the special case for one skb in rtx queue. Stefan Wahren reported a regression in single TCP flow throughput using a 100Mbit fec link, starting from commit 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt"). This last commit only made the regression more visible, because it locked the TCP flow on a particular behavior where TSQ prevented two skbs being pushed downstream, adding silences on the wire between each TSO packet. Many thanks to Stefan for his invaluable help ! Fixes: 75c119afe14f ("tcp: implement rb-tree based retransmit queue") Link: https://lore.kernel.org/netdev/7f31ddc8-9971-495e-a1f6-819df542e0af@gmx.net/ Reported-by: Stefan Wahren <wahrenst@gmx.net> Tested-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Link: https://lore.kernel.org/r/20231017124526.4060202-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18octeon_ep: update BQL sent bytes before ringing doorbellShinas Rasheed
Sometimes Tx is completed immediately after doorbell is updated, which causes Tx completion routing to update completion bytes before the same packet bytes are updated in sent bytes in transmit function, hence hitting BUG_ON() in dql_completed(). To avoid this, update BQL sent bytes before ringing doorbell. Fixes: 37d79d059606 ("octeon_ep: add Tx/Rx processing and interrupt support") Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Link: https://lore.kernel.org/r/20231017105030.2310966-1-srasheed@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-18perf/benchmark: fix seccomp_unotify benchmark for 32-bitJiri Slaby (SUSE)
Commit 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) added a reference to __NR_seccomp into perf. This is fine as it added also a definition of __NR_seccomp for 64-bit. But it failed to do so for 32-bit as instead of ifndef, ifdef was used. Fix this typo (so fix the build of perf on 32-bit). Fixes: 7d5cb68af638 (perf/benchmark: add a new benchmark for seccom_unotify) Cc: Andrei Vagin <avagin@google.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20231017083019.31733-1-jirislaby@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2023-10-18Merge tag 'nvme-6.6-2023-10-18' of git://git.infradead.org/nvme into block-6.6Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes for Linux 6.6 - nvme-rdma queue fix (Maurizio) - nvmet-auth double free fix (Maurizio) - nvme-tcp use-after-free fix (Sagi) - nvme-auth data direction fix (Martin) - nvme passthrough metadata sanitization (Keith) - nvme bogus identifiers for multi-controller ssd (Keith)" * tag 'nvme-6.6-2023-10-18' of git://git.infradead.org/nvme: nvme-pci: add BOGUS_NID for Intel 0a54 device nvmet-auth: complete a request only after freeing the dhchap pointers nvme: sanitize metadata bounce buffer for reads nvme-auth: use chap->s2 to indicate bidirectional authentication nvmet-tcp: Fix a possible UAF in queue intialization setup nvme-rdma: do not try to stop unallocated queues
2023-10-18nvme-pci: add BOGUS_NID for Intel 0a54 deviceKeith Busch
These ones claim cmic and nmic capable, so need special consideration to ignore their duplicate identifiers. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217981 Reported-by: welsh@cassens.com Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18nvmet-auth: complete a request only after freeing the dhchap pointersMaurizio Lombardi
It may happen that the work to destroy a queue (for example nvmet_tcp_release_queue_work()) is started while an auth-send or auth-receive command is still completing. nvmet_sq_destroy() will block, waiting for all the references to the sq to be dropped, the last reference is then dropped when nvmet_req_complete() is called. When this happens, both nvmet_sq_destroy() and nvmet_execute_auth_send()/_receive() will free the dhchap pointers by calling nvmet_auth_sq_free(). Since there isn't any lock, the two threads may race against each other, causing double frees and memory corruptions, as reported by KASAN. Reproduced by stress blktests nvme/041 nvme/042 nvme/043 nvme nvme2: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe4096 ================================================================== BUG: KASAN: double-free in kfree+0xec/0x4b0 Call Trace: <TASK> kfree+0xec/0x4b0 nvmet_auth_sq_free+0xe1/0x160 [nvmet] nvmet_execute_auth_send+0x482/0x16d0 [nvmet] process_one_work+0x8e5/0x1510 Allocated by task 191846: __kasan_kmalloc+0x81/0xa0 nvmet_auth_ctrl_sesskey+0xf6/0x380 [nvmet] nvmet_auth_reply+0x119/0x990 [nvmet] Freed by task 143270: kfree+0xec/0x4b0 nvmet_auth_sq_free+0xe1/0x160 [nvmet] process_one_work+0x8e5/0x1510 Fix this bug by calling nvmet_req_complete() only after freeing the pointers, so we will prevent the race by holding the sq reference. V2: remove redundant code Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication") Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18nvme: sanitize metadata bounce buffer for readsKeith Busch
User can request more metadata bytes than the device will write. Ensure kernel buffer is initialized so we're not leaking unsanitized memory on the copy-out. Fixes: 0b7f1f26f95a51a ("nvme: use the block layer for userspace passthrough metadata") Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18NFSv4.1: fixup use EXCHGID4_FLAG_USE_PNFS_DS for DS serverOlga Kornievskaia
This patches fixes commit 51d674a5e488 "NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server", purpose of that commit was to mark EXCHANGE_ID to the DS with the appropriate flag. However, connection to MDS can return both EXCHGID4_FLAG_USE_PNFS_DS and EXCHGID4_FLAG_USE_PNFS_MDS set but previous patch would only remember the USE_PNFS_DS and for the 2nd EXCHANGE_ID send that to the MDS. Instead, just mark the pnfs path exclusively. Fixes: 51d674a5e488 ("NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-10-18maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()Liam R. Howlett
Users complained about OOM errors during fork without triggering compaction. This can be fixed by modifying the flags used in mas_expected_entries() so that the compaction will be triggered in low memory situations. Since mas_expected_entries() is only used during fork, the extra argument does not need to be passed through. Additionally, the two test_maple_tree test cases and one benchmark test were altered to use the correct locking type so that allocations would not trigger sleeping and thus fail. Testing was completed with lockdep atomic sleep detection. The additional locking change requires rwsem support additions to the tools/ directory through the use of pthreads pthread_rwlock_t. With this change test_maple_tree works in userspace, as a module, and in-kernel. Users may notice that the system gave up early on attempting to start new processes instead of attempting to reclaim memory. Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4 Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com> Cc: <jason.sim@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18selftests/mm: include mman header to access MREMAP_DONTUNMAP identifierSamasth Norway Ananda
Definition for MREMAP_DONTUNMAP is not present in glibc older than 2.32 thus throwing an undeclared error when running make on mm. Including linux/mman.h solves the build error for people having older glibc. Link: https://lkml.kernel.org/r/20231012155257.891776-1-samasth.norway.ananda@oracle.com Fixes: 0183d777c29a ("selftests: mm: remove duplicate unneeded defines") Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/linux-mm/CA+G9fYvV-71XqpCr_jhdDfEtN701fBdG3q+=bafaZiGwUXy_aA@mail.gmail.com/ Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18mailmap: correct email aliasing for Oleksij RempelOleksij Rempel
Ensure the current work email addresses for Oleksij Rempel are preserved and not overridden by private address. Alias the alternate work email to the primary work email address. Link: https://lkml.kernel.org/r/20231011112519.1427077-1-o.rempel@pengutronix.de Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> # qcom Cc: Mark Brown <broonie@kernel.org> Cc: Qais Yousef <qyousef@layalina.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18mailmap: map Bartosz's old address to the current oneBartosz Golaszewski
I no longer work for BayLibre but many DT bindings have my BL address in the maintainers entries. Map it to the email address I use for kernel development. Link: https://lkml.kernel.org/r/20231011150104.73863-1-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Suggested-by: Conor Dooley <conor@kernel.org> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> # qcom Cc: Qais Yousef <qyousef@layalina.io> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18mm/damon/sysfs: check DAMOS regions update progress from before_terminate()SeongJae Park
DAMON_SYSFS can receive DAMOS tried regions update request while kdamond is already out of the main loop and before_terminate callback (damon_sysfs_before_terminate() in this case) is not yet called. And damon_sysfs_handle_cmd() can further be finished before the callback is invoked. Then, damon_sysfs_before_terminate() unlocks damon_sysfs_lock, which is not locked by anyone. This happens because the callback function assumes damon_sysfs_cmd_request_callback() should be called before it. Check if the assumption was true before doing the unlock, to avoid this problem. Link: https://lkml.kernel.org/r/20231007200432.3110-1-sj@kernel.org Fixes: f1d13cacabe1 ("mm/damon/sysfs: implement DAMOS tried regions update command") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.2.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18MAINTAINERS: Ondrej has movedOndrej Jirman
Update my email-address in MAINTAINERS to <megi@xff.cz>. Also add .mailmap entries to map my old, now blocked, email address. Link: https://lkml.kernel.org/r/20231008105812.1084226-1-megi@xff.cz Signed-off-by: Ondrej Jirman <megi@xff.cz> Cc: Bjorn Andersson <quic_bjorande@quicinc.com> Cc: Heiko Stuebner <heiko@sntech.de> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Konrad Dybcio <konrad.dybcio@linaro.org> # qcom Cc: Mark Brown <broonie@kernel.org> Cc: Qais Yousef <qyousef@layalina.io> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18kasan: disable kasan_non_canonical_hook() for HW tagsArnd Bergmann
On arm64, building with CONFIG_KASAN_HW_TAGS now causes a compile-time error: mm/kasan/report.c: In function 'kasan_non_canonical_hook': mm/kasan/report.c:637:20: error: 'KASAN_SHADOW_OFFSET' undeclared (first use in this function) 637 | if (addr < KASAN_SHADOW_OFFSET) | ^~~~~~~~~~~~~~~~~~~ mm/kasan/report.c:637:20: note: each undeclared identifier is reported only once for each function it appears in mm/kasan/report.c:640:77: error: expected expression before ';' token 640 | orig_addr = (addr - KASAN_SHADOW_OFFSET) << KASAN_SHADOW_SCALE_SHIFT; This was caused by removing the dependency on CONFIG_KASAN_INLINE that used to prevent this from happening. Use the more specific dependency on KASAN_SW_TAGS || KASAN_GENERIC to only ignore the function for hwasan mode. Link: https://lkml.kernel.org/r/20231016200925.984439-1-arnd@kernel.org Fixes: 12ec6a919b0f ("kasan: print the original fault addr when access invalid shadow") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Kees Cook <keescook@chromium.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18kasan: print the original fault addr when access invalid shadowHaibo Li
when the checked address is illegal,the corresponding shadow address from kasan_mem_to_shadow may have no mapping in mmu table. Access such shadow address causes kernel oops. Here is a sample about oops on arm64(VA 39bit) with KASAN_SW_TAGS and KASAN_OUTLINE on: [ffffffb80aaaaaaa] pgd=000000005d3ce003, p4d=000000005d3ce003, pud=000000005d3ce003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Modules linked in: CPU: 3 PID: 100 Comm: sh Not tainted 6.6.0-rc1-dirty #43 Hardware name: linux,dummy-virt (DT) pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __hwasan_load8_noabort+0x5c/0x90 lr : do_ib_ob+0xf4/0x110 ffffffb80aaaaaaa is the shadow address for efffff80aaaaaaaa. The problem is reading invalid shadow in kasan_check_range. The generic kasan also has similar oops. It only reports the shadow address which causes oops but not the original address. Commit 2f004eea0fc8("x86/kasan: Print original address on #GP") introduce to kasan_non_canonical_hook but limit it to KASAN_INLINE. This patch extends it to KASAN_OUTLINE mode. Link: https://lkml.kernel.org/r/20231009073748.159228-1-haibo.li@mediatek.com Fixes: 2f004eea0fc8("x86/kasan: Print original address on #GP") Signed-off-by: Haibo Li <haibo.li@mediatek.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Haibo Li <haibo.li@mediatek.com> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18hugetlbfs: close race between MADV_DONTNEED and page faultRik van Riel
Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed. Link: https://lkml.kernel.org/r/20231006040020.3677377-4-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18hugetlbfs: extend hugetlb_vma_lock to private VMAsRik van Riel
Extend the locking scheme used to protect shared hugetlb mappings from truncate vs page fault races, in order to protect private hugetlb mappings (with resv_map) against MADV_DONTNEED. Add a read-write semaphore to the resv_map data structure, and use that from the hugetlb_vma_(un)lock_* functions, in preparation for closing the race between MADV_DONTNEED and page faults. Link: https://lkml.kernel.org/r/20231006040020.3677377-3-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Rik van Riel <riel@surriel.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18hugetlbfs: clear resv_map pointer if mmap failsRik van Riel
Patch series "hugetlbfs: close race between MADV_DONTNEED and page fault", v7. Malloc libraries, like jemalloc and tcalloc, take decisions on when to call madvise independently from the code in the main application. This sometimes results in the application page faulting on an address, right after the malloc library has shot down the backing memory with MADV_DONTNEED. Usually this is harmless, because we always have some 4kB pages sitting around to satisfy a page fault. However, with hugetlbfs systems often allocate only the exact number of huge pages that the application wants. Due to TLB batching, hugetlbfs MADV_DONTNEED will free pages outside of any lock taken on the page fault path, which can open up the following race condition: CPU 1 CPU 2 MADV_DONTNEED unmap page shoot down TLB entry page fault fail to allocate a huge page killed with SIGBUS free page Fix that race by extending the hugetlb_vma_lock locking scheme to also cover private hugetlb mappings (with resv_map), and pulling the locking from __unmap_hugepage_final_range into helper functions called from zap_page_range_single. This ensures page faults stay locked out of the MADV_DONTNEED VMA until the huge pages have actually been freed. This patch (of 3): Hugetlbfs leaves a dangling pointer in the VMA if mmap fails. This has not been a problem so far, but other code in this patch series tries to follow that pointer. Link: https://lkml.kernel.org/r/20231006040020.3677377-1-riel@surriel.com Link: https://lkml.kernel.org/r/20231006040020.3677377-2-riel@surriel.com Fixes: 04ada095dcfc ("hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Rik van Riel <riel@surriel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18mm: zswap: fix pool refcount bug around shrink_worker()Johannes Weiner
When a zswap store fails due to the limit, it acquires a pool reference and queues the shrinker. When the shrinker runs, it drops the reference. However, there can be multiple store attempts before the shrinker wakes up and runs once. This results in reference leaks and eventual saturation warnings for the pool refcount. Fix this by dropping the reference again when the shrinker is already queued. This ensures one reference per shrinker run. Link: https://lkml.kernel.org/r/20231006160024.170748-1-hannes@cmpxchg.org Fixes: 45190f01dd40 ("mm/zswap.c: add allocation hysteresis if pool limit is hit") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Chris Mason <clm@fb.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: <stable@vger.kernel.org> [5.6+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-18pNFS/flexfiles: Check the layout validity in ff_layout_mirror_prepare_statsTrond Myklebust
Ensure that we check the layout pointer and validity after dereferencing it in ff_layout_mirror_prepare_stats. Fixes: 08e2e5bc6c9a ("pNFS/flexfiles: Clean up layoutstats") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-10-18pNFS: Fix a hang in nfs4_evict_inode()Trond Myklebust
We are not allowed to call pnfs_mark_matching_lsegs_return() without also holding a reference to the layout header, since doing so could lead to the reference count going to zero when we call pnfs_layout_remove_lseg(). This again can lead to a hang when we get to nfs4_evict_inode() and are unable to clear the layout pointer. pnfs_layout_return_unused_byserver() is guilty of this behaviour, and has been seen to trigger the refcount warning prior to a hang. Fixes: b6d49ecd1081 ("NFSv4: Fix a pNFS layout related use-after-free race when freeing the inode") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-10-18Merge tag 'asoc-fix-v6.6-rc6' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.6 A fairly large set of fixes here but all driver specific, the biggest block is Johan's work shaking out issues with device setup and teardown for the wcd938x driver which is a relatively large but clearly broken down set of changes. There is one core helper function added as part of a fix for wsa-macro.
2023-10-18Merge tag 'spi-fix-v6-6-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A fix for the npcm-fiu driver in cases where there are no dummy bytes during reads" * tag 'spi-fix-v6-6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: npcm-fiu: Fix UMA reads when dummy.nbytes == 0
2023-10-18Merge tag 'regmap-fix-v6.6-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "A straightforward fix from Johan for a long standing bug in cases where we both have regmaps without devices and something is using dev_get_regmap()" * tag 'regmap-fix-v6.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix NULL deref on lookup
2023-10-18virtio_pci: fix the common cfg map sizeXuan Zhuo
The function vp_modern_map_capability() takes the size parameter, which corresponds to the size of virtio_pci_common_cfg. As a result, this indicates the size of memory area to map. Now the size is the size of virtio_pci_common_cfg, but some feature(such as the _F_RING_RESET) needs the virtio_pci_modern_common_cfg, so this commit changes the size to the size of virtio_pci_modern_common_cfg. Cc: stable@vger.kernel.org Fixes: 0b50cece0b78 ("virtio_pci: introduce helper to get/set queue reset") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20231010031120.81272-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-10-18virtio-crypto: handle config changed by work queuezhenwei pi
MST pointed out: config change callback is also handled incorrectly in this driver, it takes a mutex from interrupt context. Handle config changed by work queue instead. Cc: stable@vger.kernel.org Cc: Gonglei (Arei) <arei.gonglei@huawei.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20231007064309.844889-1-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATEEric Auger
Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Forbade vhost iotlb msg with null size to prevent entries with size = start = 0 and last = ULONG_MAX to end up in the iotlb. Then commit 95932ab2ea07 ("vhost: allow batching hint without size") only applied the check for VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE message types to fix a regression observed with batching hit. Still, the introduction of that check introduced a regression for some users attempting to invalidate the whole ULONG_MAX range by setting the size to 0. This is the case with qemu/smmuv3/vhost integration which does not work anymore. It Looks safe to partially revert the original commit and allow VHOST_IOTLB_INVALIDATE messages with null size. vhost_iotlb_del_range() will compute a correct end iova. Same for vhost_vdpa_iotlb_unmap(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Cc: stable@vger.kernel.org # v5.17+ Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230927140544.205088-1-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa/mlx5: Fix firmware error on creation of 1k VQsDragos Tatulea
A firmware error is triggered when configuring a 9k MTU on the PF after switching to switchdev mode and then using a vdpa device with larger (1k) rings: mlx5_cmd_out_err: CREATE_GENERAL_OBJECT(0xa00) op_mod(0xd) failed, status bad resource(0x5), syndrome (0xf6db90), err(-22) This is due to the fact that the hw VQ size parameters are computed based on the umem_1/2/3_buffer_param_a/b capabilities and all device capabilities are read only when the driver is moved to switchdev mode. The problematic configuration flow looks like this: 1) Create VF 2) Unbind VF 3) Switch PF to switchdev mode. 4) Bind VF 5) Set PF MTU to 9k 6) create vDPA device 7) Start VM with vDPA device and 1K queue size Note that setting the MTU before step 3) doesn't trigger this issue. This patch reads the forementioned umem parameters at the latest point possible before the VQs of the device are created. v2: - Allocate output with kmalloc to reduce stack frame size. - Removed stable from cc. Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230831155702.1080754-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>