summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-07net: altera-tse: Initialize local structs before using itMaxime Chevallier
The regmap_config and mdio_regmap_config objects needs to be zeroed before using them. This will cause spurious errors at probe time as config->pad_bits is containing random uninitialized data. Fixes: db48abbaa18e ("net: ethernet: altera-tse: Convert to mdio-regmap and use PCS Lynx") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07Merge branch 'tools-ynl-generate-code-for-the-handshake-family'Jakub Kicinski
Jakub Kicinski says: ==================== tools: ynl: generate code for the handshake family Add necessary features and generate user space C code for serializing / deserializing messages of the handshake family. In addition to basics already present in netdev and fou, handshake has nested attrs and multi-attr u32. ==================== Link: https://lore.kernel.org/r/20230606194302.919343-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07tools: ynl: generate code for the handshake familyJakub Kicinski
Generate support for the handshake family. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07tools: ynl-gen: improve unwind on parsing errorsJakub Kicinski
When parsing multi-attr we count the objects and then allocate an array to hold the parsed objects. If an attr space has multiple multi-attr objects, however, if parsing the first array fails we'll leave the object count for the second even tho the second array was never allocated. This may cause crashes when freeing objects on error. Count attributes to a variable on the stack and only set the count in the object once the memory was allocated. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07tools: ynl-gen: fill in support for MultiAttr scalarsJakub Kicinski
The handshake family needs support for MultiAttr scalars. Right now we only support code gen for MultiAttr nested types. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07MAINTAINERS: Add entry for debug objectsThomas Gleixner
This is overdue and an oversight. Add myself to this file deespite the fact that I'm trying to reduce the number of entries in this file which have my name attached, but in the hope that patches wont get picked up elsewhere completely unreviewed and unnoticed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-07afs: Fix setting of mtime when creating a file/dir/symlinkDavid Howells
kafs incorrectly passes a zero mtime (ie. 1st Jan 1970) to the server when creating a file, dir or symlink because the mtime recorded in the afs_operation struct gets passed to the server by the marshalling routines, but the afs_mkdir(), afs_create() and afs_symlink() functions don't set it. This gets masked if a file or directory is subsequently modified. Fix this by filling in op->mtime before calling the create op. Fixes: e49c7b2f6de7 ("afs: Build an abstraction around an "operation" concept") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-06-07bpf: Add extra path pointer check to d_path helperJiri Olsa
Anastasios reported crash on stable 5.15 kernel with following BPF attached to lsm hook: SEC("lsm.s/bprm_creds_for_exec") int BPF_PROG(bprm_creds_for_exec, struct linux_binprm *bprm) { struct path *path = &bprm->executable->f_path; char p[128] = { 0 }; bpf_d_path(path, p, 128); return 0; } But bprm->executable can be NULL, so bpf_d_path call will crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI ... RIP: 0010:d_path+0x22/0x280 ... Call Trace: <TASK> bpf_d_path+0x21/0x60 bpf_prog_db9cf176e84498d9_bprm_creds_for_exec+0x94/0x99 bpf_trampoline_6442506293_0+0x55/0x1000 bpf_lsm_bprm_creds_for_exec+0x5/0x10 security_bprm_creds_for_exec+0x29/0x40 bprm_execve+0x1c1/0x900 do_execveat_common.isra.0+0x1af/0x260 __x64_sys_execve+0x32/0x40 It's problem for all stable trees with bpf_d_path helper, which was added in 5.9. This issue is fixed in current bpf code, where we identify and mark trusted pointers, so the above code would fail even to load. For the sake of the stable trees and to workaround potentially broken verifier in the future, adding the code that reads the path object from the passed pointer and verifies it's valid in kernel space. Fixes: 6e22ab9da793 ("bpf: Add d_path helper") Reported-by: Anastasios Papagiannis <tasos.papagiannnis@gmail.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230606181714.532998-1-jolsa@kernel.org
2023-06-07net: sched: fix possible refcount leak in tc_chain_tmplt_add()Hangyu Hua
try_module_get will be called in tcf_proto_lookup_ops. So module_put needs to be called to drop the refcount if ops don't implement the required function. Fixes: 9f407f1768d3 ("net: sched: introduce chain templates") Signed-off-by: Hangyu Hua <hbh25y@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: txgbe: Avoid passing uninitialised parameter to pci_wake_from_d3()Simon Horman
txgbe_shutdown() relies on txgbe_dev_shutdown() to initialise wake by passing it by reference. However, txgbe_dev_shutdown() doesn't use this parameter at all. wake is then passed uninitialised by txgbe_dev_shutdown() to pci_wake_from_d3(). Resolve this problem by: * Removing the unused parameter from txgbe_dev_shutdown() * Removing the uninitialised variable wake from txgbe_dev_shutdown() * Passing false to pci_wake_from_d3() - this assumes that although uninitialised wake was in practice false (0). I'm not sure that this counts as a bug, as I'm not sure that it manifests in any unwanted behaviour. But in any case, the issue was introduced by: 3ce7547e5b71 ("net: txgbe: Add build support for txgbe") Flagged by Smatch as: .../txgbe_main.c:486 txgbe_shutdown() error: uninitialized symbol 'wake'. No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: sched: act_police: fix sparse errors in tcf_police_dump()Eric Dumazet
Fixes following sparse errors: net/sched/act_police.c:360:28: warning: dereference of noderef expression net/sched/act_police.c:362:45: warning: dereference of noderef expression net/sched/act_police.c:362:45: warning: dereference of noderef expression net/sched/act_police.c:368:28: warning: dereference of noderef expression net/sched/act_police.c:370:45: warning: dereference of noderef expression net/sched/act_police.c:370:45: warning: dereference of noderef expression net/sched/act_police.c:376:45: warning: dereference of noderef expression net/sched/act_police.c:376:45: warning: dereference of noderef expression Fixes: d1967e495a8d ("net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: dsa: qca8k: remove unnecessary (void*) conversionsAtin Bainada
Pointer variables of (void*) type do not require type cast. Signed-off-by: Atin Bainada <hi@atinb.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: openvswitch: fix upcall counter access before allocationEelco Chaudron
Currently, the per cpu upcall counters are allocated after the vport is created and inserted into the system. This could lead to the datapath accessing the counters before they are allocated resulting in a kernel Oops. Here is an example: PID: 59693 TASK: ffff0005f4f51500 CPU: 0 COMMAND: "ovs-vswitchd" #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4 #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60 #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58 #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388 #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68 #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch] ... PID: 58682 TASK: ffff0005b2f0bf00 CPU: 0 COMMAND: "kworker/0:3" #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758 #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994 #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8 #3 [ffff80000a5d3120] die at ffffb70f0628234c #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8 #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4 #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4 #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710 #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74 #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24 #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch] #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch] #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch] #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch] #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch] #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90 We moved the per cpu upcall counter allocation to the existing vport alloc and free functions to solve this. Fixes: 95637d91fefd ("net: openvswitch: release vport resources on failure") Fixes: 1933ea365aa7 ("net: openvswitch: Add support to count upcall packets") Signed-off-by: Eelco Chaudron <echaudro@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: liquidio: fix mixed module-builtin objectMasahiro Yamada
With CONFIG_LIQUIDIO=m and CONFIG_LIQUIDIO_VF=y (or vice versa), $(common-objs) are linked to a module and also to vmlinux even though the expected CFLAGS are different between builtins and modules. This is the same situation as fixed by commit 637a642f5ca5 ("zstd: Fixing mixed module-builtin objects"). Introduce the new module, liquidio-core, to provide the common functions to liquidio and liquidio-vf. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: sched: move rtm_tca_policy declaration to include fileEric Dumazet
rtm_tca_policy is used from net/sched/sch_api.c and net/sched/cls_api.c, thus should be declared in an include file. This fixes the following sparse warning: net/sched/sch_api.c:1434:25: warning: symbol 'rtm_tca_policy' was not declared. Should it be static? Fixes: e331473fee3d ("net/sched: cls_api: add missing validation of netlink attributes") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07tcp: fix formatting in sysctl_net_ipv4.cDavid Morley
Fix incorrectly formatted tcp_syn_linear_timeouts sysctl in the ipv4_net_table. Fixes: ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") Signed-off-by: David Morley <morleyd@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Tested-by: David Morley <morleyd@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07ice: make writes to /dev/gnssX synchronousMichal Schmidt
The current ice driver's GNSS write implementation buffers writes and works through them asynchronously in a kthread. That's bad because: - The GNSS write_raw operation is supposed to be synchronous[1][2]. - There is no upper bound on the number of pending writes. Userspace can submit writes much faster than the driver can process, consuming unlimited amounts of kernel memory. A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS buffers instead of first one") would add one more problem: - The possibility of waiting for a very long time to flush the write work when doing rmmod, softlockups. To fix these issues, simplify the implementation: Drop the buffering, the write_work, and make the writes synchronous. I tested this with gpsd and ubxtool. [1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf "User interface" slide. [2] A comment in drivers/gnss/core.c:gnss_write(): /* Ignoring O_NONBLOCK, write_raw() is synchronous. */ [3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/ Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: sched: add rcu annotations around qdisc->qdisc_sleepingEric Dumazet
syzbot reported a race around qdisc->qdisc_sleeping [1] It is time we add proper annotations to reads and writes to/from qdisc->qdisc_sleeping. [1] BUG: KCSAN: data-race in dev_graft_qdisc / qdisc_lookup_rcu read to 0xffff8881286fc618 of 8 bytes by task 6928 on cpu 1: qdisc_lookup_rcu+0x192/0x2c0 net/sched/sch_api.c:331 __tcf_qdisc_find+0x74/0x3c0 net/sched/cls_api.c:1174 tc_get_tfilter+0x18f/0x990 net/sched/cls_api.c:2547 rtnetlink_rcv_msg+0x7af/0x8c0 net/core/rtnetlink.c:6386 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd write to 0xffff8881286fc618 of 8 bytes by task 6912 on cpu 0: dev_graft_qdisc+0x4f/0x80 net/sched/sch_generic.c:1115 qdisc_graft+0x7d0/0xb60 net/sched/sch_api.c:1103 tc_modify_qdisc+0x712/0xf10 net/sched/sch_api.c:1693 rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6395 netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2546 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6413 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x56f/0x640 net/netlink/af_netlink.c:1365 netlink_sendmsg+0x665/0x770 net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x375/0x4c0 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1e3/0x270 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __x64_sys_sendmsg+0x46/0x50 net/socket.c:2593 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 6912 Comm: syz-executor.5 Not tainted 6.4.0-rc3-syzkaller-00190-g0d85b27b0cc6 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/16/2023 Fixes: 3a7d0d07a386 ("net: sched: extend Qdisc with rcu") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: dsa: ocelot: unlock on error in vsc9959_qos_port_tas_set()Dan Carpenter
This error path needs call mutex_unlock(&ocelot->tas_lock) before returning. Fixes: 2d800bc500fb ("net/sched: taprio: replace tc_taprio_qopt_offload :: enable with a "cmd" enum") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07Merge branch 'rfs-lockless-annotate'David S. Miller
Eric Dumazet says: ==================== rfs: annotate lockless accesses rfs runs without locks held, so we should annotate read and writes to shared variables. It should prevent compilers forcing writes in the following situation: if (var != val) var = val; A compiler could indeed simply avoid the conditional: var = val; This matters if var is shared between many cpus. v2: aligns one closing bracket (Simon) adds Fixes: tags (Jakub) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07rfs: annotate lockless accesses to RFS sock flow tableEric Dumazet
Add READ_ONCE()/WRITE_ONCE() on accesses to the sock flow table. This also prevents a (smart ?) compiler to remove the condition in: if (table->ents[index] != newval) table->ents[index] = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07rfs: annotate lockless accesses to sk->sk_rxhashEric Dumazet
Add READ_ONCE()/WRITE_ONCE() on accesses to sk->sk_rxhash. This also prevents a (smart ?) compiler to remove the condition in: if (sk->sk_rxhash != newval) sk->sk_rxhash = newval; We need the condition to avoid dirtying a shared cache line. Fixes: fec5e652e58f ("rfs: Receive Flow Steering") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07Merge branch 'realtek-external-phy-clock'David S. Miller
Detlev Casanova says: ==================== net: phy: realtek: Support external PHY clock Some PHYs can use an external clock that must be enabled before communicating with them. Changes since v3: * Do not call genphy_suspend if WoL is enabled. Changes since v2: * Reword documentation commit message Changes since v1: * Remove the clock name as it is not guaranteed to be identical across different PHYs * Disable/Enable the clock when suspending/resuming ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: phy: realtek: Disable clock on suspendDetlev Casanova
For PHYs that call rtl821x_probe() where an external clock can be configured, make sure that the clock is disabled when ->suspend() is called and enabled on resume. The PHY_ALWAYS_CALL_SUSPEND is added to ensure that the suspend function is actually always called. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07dt-bindings: net: phy: Document support for external PHY clkDetlev Casanova
Ethern PHYs can have external an clock that needs to be activated before communicating with the PHY. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07net: phy: realtek: Add optional external PHY clockDetlev Casanova
In some cases, the PHY can use an external clock source instead of a crystal. Add an optional clock in the phy node to make sure that the clock source is enabled, if specified, before probing. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-07hv_netvsc: Allocate rx indirection table size dynamicallyShradha Gupta
Allocate the size of rx indirection table dynamically in netvsc from the value of size provided by OID_GEN_RECEIVE_SCALE_CAPABILITIES query instead of using a constant value of ITAB_NUM. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Tested-on: Ubuntu22 (azure VM, SKU size: Standard_F72s_v2) Testcases: 1. ethtool -x eth0 output 2. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-Synthetic 3. LISA testcase:PERF-NETWORK-TCP-THROUGHPUT-MULTICONNECTION-NTTTCP-SRIOV Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-06tipc: replace open-code bearer rcu_dereference access in bearer.cXin Long
Replace these open-code bearer rcu_dereference access with bearer_get(), like other places in bearer.c. While at it, also use tipc_net() instead of net_generic(net, tipc_net_id) to get "tn" in bearer.c. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/1072588a8691f970bda950c7e2834d1f2983f58e.1685976044.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06Merge tag 'for-net-2023-06-05' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fixes to debugfs registration - Fix use-after-free in hci_remove_ltk/hci_remove_irk - Fixes to ISO channel support - Fix missing checks for invalid L2CAP DCID - Fix l2cap_disconnect_req deadlock - Add lock to protect HCI_UNREGISTER * tag 'for-net-2023-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: L2CAP: Add missing checks for invalid DCID Bluetooth: ISO: use correct CIS order in Set CIG Parameters event Bluetooth: ISO: don't try to remove CIG if there are bound CIS left Bluetooth: Fix l2cap_disconnect_req deadlock Bluetooth: hci_qca: fix debugfs registration Bluetooth: fix debugfs registration Bluetooth: hci_sync: add lock to protect HCI_UNREGISTER Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk Bluetooth: ISO: Fix CIG auto-allocation to select configurable CIG Bluetooth: ISO: consider right CIS when removing CIG at cleanup ==================== Link: https://lore.kernel.org/r/20230606003454.2392552-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06Merge tag 'nf-23-06-07' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia. 2) Fix bitwise register tracking, from Jeremy Sowden. 3) Null pointer dereference when accessing conntrack helper, from Tijs Van Buggenhout. 4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima. 5) Incorrect boundary check when building chain blob. * tag 'nf-23-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: out-of-bound check in chain blob netfilter: ipset: Add schedule point in call_ad(). netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper netfilter: nft_bitwise: fix register tracking netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook() ==================== Link: https://lore.kernel.org/r/20230606225851.67394-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06Merge tag 'wireless-2023-06-06' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.4 Both rtw88 and rtw89 have a 802.11 powersave fix for a regression introduced in v6.0. mt76 fixes a race and a null pointer dereference. iwlwifi fixes an issue where not enough memory was allocated for a firmware event. And finally the stack has several smaller fixes all over. * tag 'wireless-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: cfg80211: fix locking in regulatory disconnect wifi: cfg80211: fix locking in sched scan stop work wifi: iwlwifi: mvm: Fix -Warray-bounds bug in iwl_mvm_wait_d3_notif() wifi: mac80211: fix switch count in EMA beacons wifi: mac80211: don't translate beacon/presp addrs wifi: mac80211: mlme: fix non-inheritence element wifi: cfg80211: reject bad AP MLD address wifi: mac80211: use correct iftype HE cap wifi: mt76: mt7996: fix possible NULL pointer dereference in mt7996_mac_write_txwi() wifi: rtw89: remove redundant check of entering LPS wifi: rtw89: correct PS calculation for SUPPORTS_DYNAMIC_PS wifi: rtw88: correct PS calculation for SUPPORTS_DYNAMIC_PS wifi: mt76: mt7615: fix possible race in mt7615_mac_sta_poll ==================== Link: https://lore.kernel.org/r/20230606150817.EC133C433D2@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06Merge branch 'ipv4-remove-rt_conn_flags-calls-in-flowi4_init_output'Jakub Kicinski
Guillaume Nault says: ==================== ipv4: Remove RT_CONN_FLAGS() calls in flowi4_init_output(). Remove a few RT_CONN_FLAGS() calls used inside flowi4_init_output(). These users can be easily converted to set the scope properly, instead of overloading the tos parameter with scope information as done by RT_CONN_FLAGS(). The objective is to eventually remove RT_CONN_FLAGS() entirely, which will then allow to also remove RTO_ONLINK and to finally convert ->flowi4_tos to dscp_t. ==================== Link: https://lore.kernel.org/r/cover.1685999117.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06tcp: Set route scope properly in cookie_v4_check().Guillaume Nault
RT_CONN_FLAGS(sk) overloads flowi4_tos with the RTO_ONLINK bit when sk has the SOCK_LOCALROUTE flag set. This allows ip_route_output_key_hash() to eventually adjust flowi4_scope. Instead of relying on special handling of the RTO_ONLINK bit, we can just set the route scope correctly. This will eventually allow to avoid special interpretation of tos variables and to convert ->flowi4_tos to dscp_t. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06ipv4: Set correct scope in inet_csk_route_*().Guillaume Nault
RT_CONN_FLAGS(sk) overloads the tos parameter with the RTO_ONLINK bit when sk has the SOCK_LOCALROUTE flag set. This is only useful for ip_route_output_key_hash() to eventually adjust the route scope. Let's drop RTO_ONLINK and set the correct scope directly to avoid this special case in the future and to allow converting ->flowi4_tos to dscp_t. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06virtio_net: use control_buf for coalesce paramsBrett Creeley
Commit 699b045a8e43 ("net: virtio_net: notifications coalescing support") added coalescing command support for virtio_net. However, the coalesce commands are using buffers on the stack, which is causing the device to see DMA errors. There should also be a complaint from check_for_stack() in debug_dma_map_xyz(). Fix this by adding and using coalesce params from the control_buf struct, which aligns with other commands. Cc: stable@vger.kernel.org Fixes: 699b045a8e43 ("net: virtio_net: notifications coalescing support") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20230605195925.51625-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06pds_core: Fix FW recovery detectionBrett Creeley
Commit 523847df1b37 ("pds_core: add devcmd device interfaces") included initial support for FW recovery detection. Unfortunately, the ordering in pdsc_is_fw_good() was incorrect, which was causing FW recovery to be undetected by the driver. Fix this by making sure to update the cached fw_status by calling pdsc_is_fw_running() before setting the local FW gen. Fixes: 523847df1b37 ("pds_core: add devcmd device interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230605195116.49653-1-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06Merge branch 'move-ksz9477-errata-handling-to-phy-driver'Jakub Kicinski
Robert Hancock says: ==================== Move KSZ9477 errata handling to PHY driver Patches to move handling for KSZ9477 PHY errata register fixes from the DSA switch driver into the corresponding PHY driver, for more proper layering and ordering. ==================== Link: https://lore.kernel.org/r/20230605153943.1060444-1-robert.hancock@calian.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06net: dsa: microchip: remove KSZ9477 PHY errata handlingRobert Hancock
The KSZ9477 PHY errata handling code has now been moved into the Micrel PHY driver, so it is no longer needed inside the DSA switch driver. Remove it. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06net: phy: micrel: Move KSZ9477 errata fixes to PHY driverRobert Hancock
The ksz9477 DSA switch driver is currently updating some MMD registers on the internal port PHYs to address some chip errata. However, these errata are really a property of the PHY itself, not the switch they are part of, so this is kind of a layering violation. It makes more sense for these writes to be done inside the driver which binds to the PHY and not the driver for the containing device. This also addresses some issues where the ordering of when these writes are done may have been incorrect, causing the link to erratically fail to come up at the proper speed or at all. Doing this in the PHY driver during config_init ensures that they happen before anything else tries to change the state of the PHY on the port. The new code also ensures that autonegotiation is disabled during the register writes and re-enabled afterwards, as indicated by the latest version of the errata documentation from Microchip. Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06tcp: gso: really support BIG TCPEric Dumazet
We missed that tcp_gso_segment() was assuming skb->len was smaller than 65535 : oldlen = (u16)~skb->len; This part came with commit 0718bcc09b35 ("[NET]: Fix CHECKSUM_HW GSO problems.") This leads to wrong TCP checksum. Adapt the code to accept arbitrary packet length. v2: - use two csum_add() instead of csum_fold() (Alexander Duyck) - Change delta type to __wsum to reduce casts (Alexander Duyck) Fixes: 09f3d1a3a52c ("ipv6/gso: remove temporary HBH/jumbo header") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230605161647.3624428-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06ipv6: rpl: Fix Route of Death.Kuniyuki Iwashima
A remote DoS vulnerability of RPL Source Routing is assigned CVE-2023-2156. The Source Routing Header (SRH) has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Header | Hdr Ext Len | Routing Type | Segments Left | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CmprI | CmprE | Pad | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | . . . Addresses[1..n] . . . | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The originator of an SRH places the first hop's IPv6 address in the IPv6 header's IPv6 Destination Address and the second hop's IPv6 address as the first address in Addresses[1..n]. The CmprI and CmprE fields indicate the number of prefix octets that are shared with the IPv6 Destination Address. When CmprI or CmprE is not 0, Addresses[1..n] are compressed as follows: 1..n-1 : (16 - CmprI) bytes n : (16 - CmprE) bytes Segments Left indicates the number of route segments remaining. When the value is not zero, the SRH is forwarded to the next hop. Its address is extracted from Addresses[n - Segment Left + 1] and swapped with IPv6 Destination Address. When Segment Left is greater than or equal to 2, the size of SRH is not changed because Addresses[1..n-1] are decompressed and recompressed with CmprI. OTOH, when Segment Left changes from 1 to 0, the new SRH could have a different size because Addresses[1..n-1] are decompressed with CmprI and recompressed with CmprE. Let's say CmprI is 15 and CmprE is 0. When we receive SRH with Segment Left >= 2, Addresses[1..n-1] have 1 byte for each, and Addresses[n] has 16 bytes. When Segment Left is 1, Addresses[1..n-1] is decompressed to 16 bytes and not recompressed. Finally, the new SRH will need more room in the header, and the size is (16 - 1) * (n - 1) bytes. Here the max value of n is 255 as Segment Left is u8, so in the worst case, we have to allocate 3825 bytes in the skb headroom. However, now we only allocate a small fixed buffer that is IPV6_RPL_SRH_WORST_SWAP_SIZE (16 + 7 bytes). If the decompressed size overflows the room, skb_push() hits BUG() below [0]. Instead of allocating the fixed buffer for every packet, let's allocate enough headroom only when we receive SRH with Segment Left 1. [0]: skbuff: skb_under_panic: text:ffffffff81c9f6e2 len:576 put:576 head:ffff8880070b5180 data:ffff8880070b4fb0 tail:0x70 end:0x140 dev:lo kernel BUG at net/core/skbuff.c:200! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 154 Comm: python3 Not tainted 6.4.0-rc4-00190-gc308e9ec0047 #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:skb_panic (net/core/skbuff.c:200) Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 80 6e 77 82 e8 ad 8b 60 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffc90000003da0 EFLAGS: 00000246 RAX: 0000000000000085 RBX: ffff8880058a6600 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff88807dc1c540 RDI: ffff88807dc1c540 RBP: ffffc90000003e48 R08: ffffffff82b392c8 R09: 00000000ffffdfff R10: ffffffff82a592e0 R11: ffffffff82b092e0 R12: ffff888005b1c800 R13: ffff8880070b51b8 R14: ffff888005b1ca18 R15: ffff8880070b5190 FS: 00007f4539f0b740(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055670baf3000 CR3: 0000000005b0e000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <IRQ> skb_push (net/core/skbuff.c:210) ipv6_rthdr_rcv (./include/linux/skbuff.h:2880 net/ipv6/exthdrs.c:634 net/ipv6/exthdrs.c:718) ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:437 (discriminator 5)) ip6_input_finish (./include/linux/rcupdate.h:805 net/ipv6/ip6_input.c:483) __netif_receive_skb_one_core (net/core/dev.c:5494) process_backlog (./include/linux/rcupdate.h:805 net/core/dev.c:5934) __napi_poll (net/core/dev.c:6496) net_rx_action (net/core/dev.c:6565 net/core/dev.c:6696) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:572) do_softirq (kernel/softirq.c:472 kernel/softirq.c:459) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:396) __dev_queue_xmit (net/core/dev.c:4272) ip6_finish_output2 (./include/net/neighbour.h:544 net/ipv6/ip6_output.c:134) rawv6_sendmsg (./include/net/dst.h:458 ./include/linux/netfilter.h:303 net/ipv6/raw.c:656 net/ipv6/raw.c:914) sock_sendmsg (net/socket.c:724 net/socket.c:747) __sys_sendto (net/socket.c:2144) __x64_sys_sendto (net/socket.c:2156 net/socket.c:2152 net/socket.c:2152) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7f453a138aea Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 7e c3 0f 1f 44 00 00 41 54 48 83 ec 30 44 89 RSP: 002b:00007ffcc212a1c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007ffcc212a288 RCX: 00007f453a138aea RDX: 0000000000000060 RSI: 00007f4539084c20 RDI: 0000000000000003 RBP: 00007f4538308e80 R08: 00007ffcc212a300 R09: 000000000000001c R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: ffffffffc4653600 R14: 0000000000000001 R15: 00007f4539712d1b </TASK> Modules linked in: Fixes: 8610c7c6e3bd ("net: ipv6: add support for rpl sr exthdr") Reported-by: Max VA Closes: https://www.interruptlabs.co.uk/articles/linux-ipv6-route-of-death Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230605180617.67284-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06netlink: specs: ethtool: fix random typosJakub Kicinski
Working on the code gen for C reveals typos in the ethtool spec as the compiler tries to find the names in the existing uAPI header. Fix the mistakes. Fixes: a353318ebf24 ("tools: ynl: populate most of the ethtool spec") Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230605233257.843977-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-07netfilter: nf_tables: out-of-bound check in chain blobPablo Neira Ayuso
Add current size of rule expressions to the boundary check. Fixes: 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-07netfilter: ipset: Add schedule point in call_ad().Kuniyuki Iwashima
syzkaller found a repro that causes Hung Task [0] with ipset. The repro first creates an ipset and then tries to delete a large number of IPs from the ipset concurrently: IPSET_ATTR_IPADDR_IPV4 : 172.20.20.187 IPSET_ATTR_CIDR : 2 The first deleting thread hogs a CPU with nfnl_lock(NFNL_SUBSYS_IPSET) held, and other threads wait for it to be released. Previously, the same issue existed in set->variant->uadt() that could run so long under ip_set_lock(set). Commit 5e29dc36bd5e ("netfilter: ipset: Rework long task execution when adding/deleting entries") tried to fix it, but the issue still exists in the caller with another mutex. While adding/deleting many IPs, we should release the CPU periodically to prevent someone from abusing ipset to hang the system. Note we need to increment the ipset's refcnt to prevent the ipset from being destroyed while rescheduling. [0]: INFO: task syz-executor174:268 blocked for more than 143 seconds. Not tainted 6.4.0-rc1-00145-gba79e9a73284 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor174 state:D stack:0 pid:268 ppid:260 flags:0x0000000d Call trace: __switch_to+0x308/0x714 arch/arm64/kernel/process.c:556 context_switch kernel/sched/core.c:5343 [inline] __schedule+0xd84/0x1648 kernel/sched/core.c:6669 schedule+0xf0/0x214 kernel/sched/core.c:6745 schedule_preempt_disabled+0x58/0xf0 kernel/sched/core.c:6804 __mutex_lock_common kernel/locking/mutex.c:679 [inline] __mutex_lock+0x6fc/0xdb0 kernel/locking/mutex.c:747 __mutex_lock_slowpath+0x14/0x20 kernel/locking/mutex.c:1035 mutex_lock+0x98/0xf0 kernel/locking/mutex.c:286 nfnl_lock net/netfilter/nfnetlink.c:98 [inline] nfnetlink_rcv_msg+0x480/0x70c net/netfilter/nfnetlink.c:295 netlink_rcv_skb+0x1c0/0x350 net/netlink/af_netlink.c:2546 nfnetlink_rcv+0x18c/0x199c net/netfilter/nfnetlink.c:658 netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline] netlink_unicast+0x664/0x8cc net/netlink/af_netlink.c:1365 netlink_sendmsg+0x6d0/0xa4c net/netlink/af_netlink.c:1913 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg net/socket.c:747 [inline] ____sys_sendmsg+0x4b8/0x810 net/socket.c:2503 ___sys_sendmsg net/socket.c:2557 [inline] __sys_sendmsg+0x1f8/0x2a4 net/socket.c:2586 __do_sys_sendmsg net/socket.c:2595 [inline] __se_sys_sendmsg net/socket.c:2593 [inline] __arm64_sys_sendmsg+0x80/0x94 net/socket.c:2593 __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline] invoke_syscall+0x84/0x270 arch/arm64/kernel/syscall.c:52 el0_svc_common+0x134/0x24c arch/arm64/kernel/syscall.c:142 do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:193 el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:637 el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591 Reported-by: syzkaller <syzkaller@googlegroups.com> Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-07netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelperTijs Van Buggenhout
An nf_conntrack_helper from nf_conn_help may become NULL after DNAT. Observed when TCP port 1720 (Q931_PORT), associated with h323 conntrack helper, is DNAT'ed to another destination port (e.g. 1730), while nfqueue is being used for final acceptance (e.g. snort). This happenned after transition from kernel 4.14 to 5.10.161. Workarounds: * keep the same port (1720) in DNAT * disable nfqueue * disable/unload h323 NAT helper $ linux-5.10/scripts/decode_stacktrace.sh vmlinux < /tmp/kernel.log BUG: kernel NULL pointer dereference, address: 0000000000000084 [..] RIP: 0010:nf_conntrack_update (net/netfilter/nf_conntrack_core.c:2080 net/netfilter/nf_conntrack_core.c:2134) nf_conntrack [..] nfqnl_reinject (net/netfilter/nfnetlink_queue.c:237) nfnetlink_queue nfqnl_recv_verdict (net/netfilter/nfnetlink_queue.c:1230) nfnetlink_queue nfnetlink_rcv_msg (net/netfilter/nfnetlink.c:241) nfnetlink [..] Fixes: ee04805ff54a ("netfilter: conntrack: make conntrack userspace helpers work again") Signed-off-by: Tijs Van Buggenhout <tijs.van.buggenhout@axsguard.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-07netfilter: nft_bitwise: fix register trackingJeremy Sowden
At the end of `nft_bitwise_reduce`, there is a loop which is intended to update the bitwise expression associated with each tracked destination register. However, currently, it just updates the first register repeatedly. Fix it. Fixes: 34cc9e52884a ("netfilter: nf_tables: cancel tracking for clobbered destination registers") Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-07netfilter: nf_tables: Add null check for nla_nest_start_noflag() in ↵Gavrilov Ilia
nft_dump_basechain_hook() The nla_nest_start_noflag() function may fail and return NULL; the return value needs to be checked. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with SVACE. Fixes: d54725cd11a5 ("netfilter: nf_tables: support for multiple devices per netdev hook") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-06-06Merge branch 'tools-ynl-user-space-c'Jakub Kicinski
Jakub Kicinski says: ==================== tools: ynl: user space C Use the code gen which is already in tree to generate a user space library for a handful of simple families. I find YNL C quite useful in some WIP projects, and I think others may find it useful, too. I was hoping someone will pick this work up and finish it... but it seems that Python YNL has largely stolen the thunder. Python may not be great for selftest, tho, and actually this lib is more fully-featured. The Python script was meant as a quick demo, funny how those things go. v2: https://lore.kernel.org/all/20230604175843.662084-1-kuba@kernel.org/ v1: https://lore.kernel.org/all/20230603052547.631384-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230605190108.809439-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06tools: ynl: add sample for netdevJakub Kicinski
Add a sample application using the C library. My main goal is to make writing selftests easier but until I have some of those ready I think it's useful to show off the functionality and let people poke and tinker. Sample outputs - dump: $ ./netdev Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 0 lo[1] 0: enp1s0[2] 23: basic redirect rx-sg Notifications (watching veth pair getting added and deleted): $ ./netdev Select ifc ($ifindex; or 0 = dump; or -2 ntf check): -2 [53] 0: (ntf: dev-add-ntf) [54] 0: (ntf: dev-add-ntf) [54] 23: basic redirect rx-sg (ntf: dev-change-ntf) [53] 23: basic redirect rx-sg (ntf: dev-change-ntf) [53] 23: basic redirect rx-sg (ntf: dev-del-ntf) [54] 23: basic redirect rx-sg (ntf: dev-del-ntf) Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-06tools: ynl: support fou and netdev in CJakub Kicinski
Generate the code for netdev and fou families. They are simple and already supported by the code gen. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>