summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-04-15net: ethernet: ti: Add accessors for struct k3_cppi_desc_pool membersJulien Panis
This patch adds accessors for desc_size and cpumem members. They may be used, for instance, to compute a descriptor index. Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Julien Panis <jpanis@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15udp: Avoid call to compute_score on multiple sitesGabriel Krisman Bertazi
We've observed a 7-12% performance regression in iperf3 UDP ipv4 and ipv6 tests with multiple sockets on Zen3 cpus, which we traced back to commit f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present"). The failing tests were those that would spawn UDP sockets per-cpu on systems that have a high number of cpus. Unsurprisingly, it is not caused by the extra re-scoring of the reused socket, but due to the compiler no longer inlining compute_score, once it has the extra call site in udp4_lib_lookup2. This is augmented by the "Safe RET" mitigation for SRSO, needed in our Zen3 cpus. We could just explicitly inline it, but compute_score() is quite a large function, around 300b. Inlining in two sites would almost double udp4_lib_lookup2, which is a silly thing to do just to workaround a mitigation. Instead, this patch shuffles the code a bit to avoid the multiple calls to compute_score. Since it is a static function used in one spot, the compiler can safely fold it in, as it did before, without increasing the text size. With this patch applied I ran my original iperf3 testcases. The failing cases all looked like this (ipv4): iperf3 -c 127.0.0.1 --udp -4 -f K -b $R -l 8920 -t 30 -i 5 -P 64 -O 2 where $R is either 1G/10G/0 (max, unlimited). I ran 3 times each. baseline is v6.9-rc3. harmean == harmonic mean; CV == coefficient of variation. ipv4: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 1743852.66(0.0208) 1725933.02(0.0167) 1705203.78(0.0386) patched 1968727.61(0.0035) 1962283.22(0.0195) 1923853.50(0.0256) ipv6: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 1729020.03(0.0028) 1691704.49(0.0243) 1692251.34(0.0083) patched 1900422.19(0.0067) 1900968.01(0.0067) 1568532.72(0.1519) This restores the performance we had before the change above with this benchmark. We obviously don't expect any real impact when mitigations are disabled, but just to be sure it also doesn't regresses: mitigations=off ipv4: 1G 10G MAX HARMEAN (CV) HARMEAN (CV) HARMEAN (CV) baseline 3230279.97(0.0066) 3229320.91(0.0060) 2605693.19(0.0697) patched 3242802.36(0.0073) 3239310.71(0.0035) 2502427.19(0.0882) Cc: Lorenz Bauer <lmb@isovalent.com> Fixes: f0ea27e7bfe1 ("udp: re-score reuseport groups when connected sockets are present") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: ip6_gre: Remove generic .ndo_get_stats64Breno Leitao
Commit 3e2f544dd8a33 ("net: get stats64 if device if driver is configured") moved the callback to dev_get_tstats64() to net core, so, unless the driver is doing some custom stats collection, it does not need to set .ndo_get_stats64. Since this driver is now relying in NETDEV_PCPU_STAT_TSTATS, then, it doesn't need to set the dev_get_tstats64() generic .ndo_get_stats64 function pointer. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: ipv6_gre: Do not use custom stat allocatorBreno Leitao
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of in this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the ip6_gre and leverage the network core allocation instead. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: dsa: convert dsa_user_phylink_fixed_state() to use dsa_phylink_to_port()Russell King (Oracle)
Convert dsa_user_phylink_fixed_state() to use the newly introduced dsa_phylink_to_port() helper. Suggested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: constify net_classHeiner Kallweit
AFAICS all users of net_class take a const struct class * argument. Therefore fully constify net_class. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15octeontx2-pf: fix FLOW_DIS_IS_FRAGMENT implementationAsbjørn Sloth Tønnesen
Upon reviewing the flower control flags handling in this driver, I notice that the key wasn't being used, only the mask. Ie. `tc flower ... ip_flags nofrag` was hardware offloaded as `... ip_flags frag`. Only compile tested, no access to HW. Fixes: c672e3727989 ("octeontx2-pf: Add support to filter packet based on IP fragment") Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15gve: Correctly report software timestamping capabilitiesJohn Fraker
gve has supported software timestamp generation since its inception, but has not advertised that support via ethtool. This patch correctly advertises that support. Signed-off-by: John Fraker <jfraker@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: save some cycles when doing skb_attempt_defer_free()Jason Xing
Normally, we don't face these two exceptions very often meanwhile we have some chance to meet the condition where the current cpu id is the same as skb->alloc_cpu. One simple test that can help us see the frequency of this statement 'cpu == raw_smp_processor_id()': 1. running iperf -s and iperf -c [ip] -P [MAX CPU] 2. using BPF to capture skb_attempt_defer_free() I can see around 4% chance that happens to satisfy the statement. So moving this statement at the beginning can save some cycles in most cases. Signed-off-by: Jason Xing <kernelxing@tencent.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15Merge branch 'flower-control-flags'David S. Miller
Asbjørn Sloth Tønnesen says: ==================== flower: validate control flags I have reviewed the flower control flags code. In all, but one (sfc), the flags field wasn't checked properly for unsupported flags. In this series I have only included a single example user for each helper function. Once the helpers are in, I will submit patches for all other drivers implementing flower. After which there will be: - 6 drivers using flow_rule_is_supp_control_flags() - 8 drivers using flow_rule_has_control_flags() - 11 drivers using flow_rule_match_has_control_flags() --- Changelog: v3: - Added Reviewed-by from Louis Peens (first two patches) - Properly fixed kernel-doc format v2: https://lore.kernel.org/netdev/20240410093235.5334-1-ast@fiberby.net/ - Squashed the 3 helper functions to one commmit (requested by Baowen Zheng) - Renamed helper functions to avoid double negatives (suggested by Louis Peens) - Reverse booleans in some functions and callsites to align with new names - Fix autodoc format v1: https://lore.kernel.org/netdev/20240408130927.78594-1-ast@fiberby.net/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: dsa: microchip: ksz9477: flower: validate control flagsAsbjørn Sloth Tønnesen
Add check for unsupported control flags. Only compile-tested, no access to HW. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: prestera: flower: validate control flagsAsbjørn Sloth Tønnesen
Add check for unsupported control flags. Only compile-tested, no access to HW. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15nfp: flower: fix check for unsupported control flagsAsbjørn Sloth Tønnesen
Use flow_rule_is_supp_control_flags() Check the mask, not the key, for unsupported control flags. Only compile-tested, no access to HW Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15flow_offload: add control flag checking helpersAsbjørn Sloth Tønnesen
These helpers aim to help drivers, with checking for the presence of unsupported control flags. For drivers supporting at least one control flag: flow_rule_is_supp_control_flags() For drivers using flow_rule_match_control(), but not using flags: flow_rule_has_control_flags() For drivers not using flow_rule_match_control(): flow_rule_match_has_control_flags() While primarily aimed at FLOW_DISSECTOR_KEY_CONTROL and flow_rule_match_control(), then the first two can also be used with FLOW_DISSECTOR_KEY_ENC_CONTROL and flow_rule_match_enc_control(). These helpers mirrors the existing check done in sfc: drivers/net/ethernet/sfc/tc.c +276 Only compile-tested. Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15net: dev_addr_lists: move locking out of init/exit in kunitJakub Kicinski
We lock and unlock rtnl in init/exit for convenience, but it started causing problems if the exit is handled by a different thread. To avoid having to futz with disabling locking assertions move the locking into the test cases. We don't use ASSERTs so it should be safe. ============= dev-addr-list-test (6 subtests) ============== [PASSED] dev_addr_test_basic [PASSED] dev_addr_test_sync_one [PASSED] dev_addr_test_add_del [PASSED] dev_addr_test_del_main [PASSED] dev_addr_test_add_set [PASSED] dev_addr_test_add_excl =============== [PASSED] dev-addr-list-test ================ Link: https://lore.kernel.org/all/20240403131936.787234-7-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15inet: bring NLM_DONE out to a separate recv() againJakub Kicinski
Commit under Fixes optimized the number of recv() calls needed during RTM_GETROUTE dumps, but we got multiple reports of applications hanging on recv() calls. Applications expect that a route dump will be terminated with a recv() reading an individual NLM_DONE message. Coalescing NLM_DONE is perfectly legal in netlink, but even tho reporters fixed the code in respective projects, chances are it will take time for those applications to get updated. So revert to old behavior (for now)? Old kernel (5.19): $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' Recv: read 692 bytes, 11 messages nl_len = 68 (52) nl_flags = 0x22 nl_type = 24 ... nl_len = 60 (44) nl_flags = 0x22 nl_type = 24 Recv: read 20 bytes, 1 messages nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 Before (6.9-rc2): $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' Recv: read 712 bytes, 12 messages nl_len = 68 (52) nl_flags = 0x22 nl_type = 24 ... nl_len = 60 (44) nl_flags = 0x22 nl_type = 24 nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 After: $ ./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}' Recv: read 692 bytes, 11 messages nl_len = 68 (52) nl_flags = 0x22 nl_type = 24 ... nl_len = 60 (44) nl_flags = 0x22 nl_type = 24 Recv: read 20 bytes, 1 messages nl_len = 20 (4) nl_flags = 0x2 nl_type = 3 Reported-by: Stefano Brivio <sbrivio@redhat.com> Link: https://lore.kernel.org/all/20240315124808.033ff58d@elisabeth Reported-by: Ilya Maximets <i.maximets@ovn.org> Link: https://lore.kernel.org/all/02b50aae-f0e9-47a4-8365-a977a85975d3@ovn.org Fixes: 4ce5dc9316de ("inet: switch inet_dump_fib() to RCU protection") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-15drop_monitor: replace spin_lock by raw_spin_lockWander Lairson Costa
trace_drop_common() is called with preemption disabled, and it acquires a spin_lock. This is problematic for RT kernels because spin_locks are sleeping locks in this configuration, which causes the following splat: BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47 preempt_count: 1, expected: 0 RCU nest depth: 2, expected: 2 5 locks held by rcuc/47/449: #0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210 #1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130 #2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210 #3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70 #4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290 irq event stamp: 139909 hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80 hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290 softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170 softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0 Preemption disabled at: [<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0 CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7 Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022 Call Trace: <TASK> dump_stack_lvl+0x8c/0xd0 dump_stack+0x14/0x20 __might_resched+0x21e/0x2f0 rt_spin_lock+0x5e/0x130 ? trace_drop_common.constprop.0+0xb5/0x290 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_drop_common.constprop.0+0xb5/0x290 ? preempt_count_sub+0x1c/0xd0 ? _raw_spin_unlock_irqrestore+0x4a/0x80 ? __pfx_trace_drop_common.constprop.0+0x10/0x10 ? rt_mutex_slowunlock+0x26a/0x2e0 ? skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_rt_mutex_slowunlock+0x10/0x10 ? skb_queue_purge_reason.part.0+0x1bf/0x230 trace_kfree_skb_hit+0x15/0x20 trace_kfree_skb+0xe9/0x150 kfree_skb_reason+0x7b/0x110 skb_queue_purge_reason.part.0+0x1bf/0x230 ? __pfx_skb_queue_purge_reason.part.0+0x10/0x10 ? mark_lock.part.0+0x8a/0x520 ... trace_drop_common() also disables interrupts, but this is a minor issue because we could easily replace it with a local_lock. Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic context. Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-14bcachefs: Check for backpointer bucket_offset >= bucket sizeKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: bch_member.btree_allocated_bitmapKent Overstreet
This adds a small (64 bit) per-device bitmap that tracks ranges that have btree nodes, for accelerating btree node scan if it is ever needed. - New helpers, bch2_dev_btree_bitmap_marked() and bch2_dev_bitmap_mark(), for checking and updating the bitmap - Interior btree update path updates the bitmaps when required - The check_allocations pass has a new fsck_err check, btree_bitmap_not_marked - New on disk format version, mi_btree_mitmap, which indicates the new bitmap is present - Upgrade table lists the required recovery pass and expected fsck error - Btree node scan uses the bitmap to skip ranges if we're on the new version Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: sysfs internal/trigger_journal_flushKent Overstreet
Add a sysfs knob for immediately flushing the entire journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Fix bch2_btree_node_fill() for !pathKent Overstreet
We shouldn't be doing the unlock/relock dance when we're not using a path - this fixes an assertion pop when called from btree node scan. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: add safety checks in bch2_btree_node_fill()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: Interior known are required to have known key typesKent Overstreet
For forwards compatibilyt, we allow bkeys of unknown type in leaf nodes; we can simply ignore metadata we don't understand. Pointers to btree nodes must always be of known types, howwever. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14bcachefs: add missing bounds check in __bch2_bkey_val_invalid()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-14Linux 6.9-rc4v6.9-rc4Linus Torvalds
2024-04-14Merge tag 'pull-sysfs-annotation-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysfs fix from Al Viro: "Get rid of lockdep false positives around sysfs/overlayfs syzbot has uncovered a class of lockdep false positives for setups with sysfs being one of the backing layers in overlayfs. The root cause is that of->mutex allocated when opening a sysfs file read-only (which overlayfs might do) is confused with of->mutex of a file opened writable (held in write to sysfs file, which overlayfs won't do). Assigning them separate lockdep classes fixes that bunch and it's obviously safe" * tag 'pull-sysfs-annotation-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: kernfs: annotate different lockdep class for of->mutex of writable files
2024-04-14Merge tag 'x86-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Follow up fixes for the BHI mitigations code - Fix !SPECULATION_MITIGATIONS bug not turning off mitigations as expected - Work around an APIC emulation bug when the kernel is built with Clang and run as a SEV guest - Follow up x86 topology fixes * tag 'x86-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Move TOPOEXT enablement into the topology parser x86/cpu/amd: Make the NODEID_MSR union actually work x86/cpu/amd: Make the CPUID 0x80000008 parser correct x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto x86/bugs: Clarify that syscall hardening isn't a BHI mitigation x86/bugs: Fix BHI handling of RRSBA x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr' x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES x86/bugs: Fix BHI documentation x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n x86/topology: Don't update cpu_possible_map in topo_set_cpuids() x86/bugs: Fix return type of spectre_bhi_state() x86/apic: Force native_apic_mem_read() to use the MOV instruction
2024-04-14Merge tag 'timers-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: - Address a (valid) W=1 build warning - Fix timer self-tests - Annotate a KCSAN warning wrt. accesses to the tick_do_timer_cpu global variable - Address a !CONFIG_BUG build warning * tag 'timers-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: kselftest: Fix build failure with NOLIBC selftests: timers: Fix abs() warning in posix_timers test selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn selftests: timers: Fix posix_timers ksft_print_msg() warning selftests: timers: Fix valid-adjtimex signed left-shift undefined behavior bug: Fix no-return-statement warning with !CONFIG_BUG timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu selftests/timers/posix_timers: Reimplement check_timer_distribution() irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
2024-04-14Merge tag 'perf-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix the x86 PMU multi-counter code returning invalid data in certain circumstances" * tag 'perf-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix out of range data
2024-04-14Merge tag 'locking-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: "Fix a PREEMPT_RT build bug" * tag 'locking-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking: Make rwsem_assert_held_write_nolockdep() build with PREEMPT_RT=y
2024-04-14Merge tag 'irq-urgent-2024-04-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix a bug in the GIC irqchip driver" * tag 'irq-urgent-2024-04-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Fix VSYNC referencing an unmapped VPE on GIC v4.1
2024-04-14Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio bugfixes from Michael Tsirkin: "Some small, obvious (in hindsight) bugfixes: - new ioctl in vhost-vdpa has a wrong # - not too late to fix - vhost has apparently been lacking an smp_rmb() - due to code duplication :( The duplication will be fixed in the next merge cycle, this is a minimal fix - an error message in vhost talks about guest moving used index - which of course never happens, guest only ever moves the available index - i2c-virtio didn't set the driver owner so it did not get refcounted correctly" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: correct misleading printing information vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZE virtio: store owner from modules with register_virtio_driver() vhost: Add smp_rmb() in vhost_enable_notify() vhost: Add smp_rmb() in vhost_vq_avail_empty()
2024-04-14Merge tag 'dma-maping-6.9-2024-04-14' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fixes from Christoph Hellwig: - fix up swiotlb buffer padding even more (Petr Tesarik) - fix for partial dma_sync on swiotlb (Michael Kelley) - swiotlb debugfs fix (Dexuan Cui) * tag 'dma-maping-6.9-2024-04-14' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: do not set total_used to 0 in swiotlb_create_debugfs_files() swiotlb: fix swiotlb_bounce() to do partial sync's correctly swiotlb: extend buffer pre-padding to alloc_align_mask if necessary
2024-04-14net: change maximum number of UDP segments to 128Yuri Benditovich
The commit fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") adds check of potential number of UDP segments vs UDP_MAX_SEGMENTS in linux/virtio_net.h. After this change certification test of USO guest-to-guest transmit on Windows driver for virtio-net device fails, for example with packet size of ~64K and mss of 536 bytes. In general the USO should not be more restrictive than TSO. Indeed, in case of unreasonably small mss a lot of segments can cause queue overflow and packet loss on the destination. Limit of 128 segments is good for any practical purpose, with minimal meaningful mss of 536 the maximal UDP packet will be divided to ~120 segments. The number of segments for UDP packets is validated vs UDP_MAX_SEGMENTS also in udp.c (v4,v6), this does not affect quest-to-guest path but does affect packets sent to host, for example. It is important to mention that UDP_MAX_SEGMENTS is kernel-only define and not available to user mode socket applications. In order to request MSS smaller than MTU the applications just uses setsockopt with SOL_UDP and UDP_SEGMENT and there is no limitations on socket API level. Fixes: fc8b2a619469 ("net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation") Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-14kernfs: annotate different lockdep class for of->mutex of writable filesAmir Goldstein
The writable file /sys/power/resume may call vfs lookup helpers for arbitrary paths and readonly files can be read by overlayfs from vfs helpers when sysfs is a lower layer of overalyfs. To avoid a lockdep warning of circular dependency between overlayfs inode lock and kernfs of->mutex, use a different lockdep class for writable and readonly kernfs files. Reported-by: syzbot+9a5b0ced8b1bfb238b56@syzkaller.appspotmail.com Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries") Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-04-13bcachefs: Fix btree node merging on write buffer btreesKent Overstreet
The btree write buffer flush fastpath that avoids the main transaction commit path had the unfortunate side effect of not doing btree node merging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Disable merges from interior update pathKent Overstreet
There's been a bug in the btree write buffer where it wasn't triggering btree node merges - and leaving behind a bunch of nearly empty btree nodes. Then during journal replay, when updates to the backpointers btree aren't using the btree write buffer (because we require synchronization with journal replay), we end up doing those merges all at once. Then if it's the interior update path running them, we deadlock because those run with the highest watermark. There's no real need for the interior update path to be doing btree node merges; other code paths can handle that at lower watermarks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Run merges at BCH_WATERMARK_btreeKent Overstreet
This fixes a deadlock where the interior update path during journal replay ends up doing a ton of merges on the backpointers btree, and deadlocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Fix missing write refs in fs fio pathsKent Overstreet
bch2_journal_flush_seq requires us to have a write ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Fix deadlock in journal replayKent Overstreet
btree_key_can_insert_cached() should be checking the watermark - BCH_TRANS_COMMIT_journal_replay really means nonblocking mode when watermark < reclaim, it was being used incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Go rw if running any explicit recovery passesKent Overstreet
This fixes a bug where we fail to start when upgrading/downgrading because we forgot we needed to go rw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Standardize helpers for printing enum strs with bounds checksKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: don't queue btree nodes for rewrites during scanKent Overstreet
many nodes found during scan will be old nodes, overwritten by newer nodes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: fix race in bch2_btree_node_evict()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: fix unsafety in bch2_stripe_to_text()Kent Overstreet
.to_text() functions need to work on key values that didn't pass .valid Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: fix unsafety in bch2_extent_ptr_to_text()Kent Overstreet
Need to check if we have a valid bucket before checking if ptr is stale Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: btree node scan: handle encrypted nodesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Check for packed bkeys that are too bigKent Overstreet
add missing validation; fixes assertion pop in bkey unpack Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13bcachefs: Fix UAFs of btree_insert_entry arrayKent Overstreet
The btree paths array is now dynamically resizable - and as well the btree_insert_entries array, as it needs to be the same size. The merge path (and interior update path) allocates new btree paths, thus can trigger a resize; thus we need to not retain direct pointers after invoking merge; similarly when running btree node triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-13Merge tag 'ata-6.9-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Damien Le Moal: - Add the mask_port_map parameter to the ahci driver. This is a follow-up to the recent snafu with the ASMedia controller and its virtual port hidding port-multiplier devices. As ASMedia confirmed that there is no way to determine if these slow-to-probe virtual ports are actually representing the ports of a port-multiplier devices, this new parameter allow masking ports to significantly speed up probing during system boot, resulting in shorter boot times. - A fix for an incorrect handling of a port unlock in ata_scsi_dev_rescan(). - Allow command duration limits to be detected for ACS-4 devices are there are such devices out in the field. * tag 'ata-6.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: ata: libata-core: Allow command duration limits detection for ACS-4 drives ata: libata-scsi: Fix ata_scsi_dev_rescan() error path ata: ahci: Add mask_port_map module parameter