summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2024-12-16rxrpc: Fix ability to add more data to a call once MSG_MORE deassertedDavid Howells
When userspace is adding data to an RPC call for transmission, it must pass MSG_MORE to sendmsg() if it intends to add more data in future calls to sendmsg(). Calling sendmsg() without MSG_MORE being asserted closes the transmission phase of the call (assuming sendmsg() adds all the data presented) and further attempts to add more data should be rejected. However, this is no longer the case. The change of call state that was previously the guard got bumped over to the I/O thread, which leaves a window for a repeat sendmsg() to insert more data. This previously went unnoticed, but the more recent patch that changed the structures behind the Tx queue added a warning: WARNING: CPU: 3 PID: 6639 at net/rxrpc/sendmsg.c:296 rxrpc_send_data+0x3f2/0x860 and rejected the additional data, returning error EPROTO. Fix this by adding a guard flag to the call, setting the flag when we queue the final packet and then rejecting further attempts to add data with EPROTO. Fixes: 2d689424b618 ("rxrpc: Move call state changes from sendmsg to I/O thread") Reported-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/6757fb68.050a0220.2477f.005f.GAE@google.com/ Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: syzbot+ff11be94dfcd7a5af8da@syzkaller.appspotmail.com cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870480.1734037462@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16rxrpc: Disable IRQ, not BH, to take the lock for ->attend_linkDavid Howells
Use spin_lock_irq(), not spin_lock_bh() to take the lock when accessing the ->attend_link() to stop a delay in the I/O thread due to an interrupt being taken in the app thread whilst that holds the lock and vice versa. Fixes: a2ea9a907260 ("rxrpc: Use irq-disabling spinlocks between app and I/O thread") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/2870146.1734037095@warthog.procyon.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue statsJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 45 != 44 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 45 != 44 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 103 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 103 != 100 missing queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 125, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), len(set(queues[qtype])), # Check failed 102 != 100 repeated queue keys # Check| At /root/ksft-net-drv/drivers/net/./stats.py, line 127, in qstat_by_ifindex: # Check| ksft_eq(len(queues[qtype]), max(queues[qtype]) + 1, # Check failed 102 != 100 missing queue keys not ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:4 fail:1 xfail:0 xpass:0 skip:0 error:0 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:stats.py timeout set to 45 selftests: drivers/net: stats.py KTAP version 1 1..5 ok 1 stats.check_pause ok 2 stats.check_fec ok 3 stats.pkt_byte_sum ok 4 stats.qstat_by_ifindex ok 5 stats.check_down # Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: ab63a2387cb9 ("netdev: add per-queue statistics") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16netdev: fix repeated netlink messages in queue dumpJakub Kicinski
The context is supposed to record the next queue to dump, not last dumped. If the dump doesn't fit we will restart from the already-dumped queue, duplicating the message. Before this fix and with the selftest improvements later in this series we see: # ./run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 102 != 100 # Check| At /root/ksft-net-drv/drivers/net/./queues.py, line 32, in get_queues: # Check| ksft_eq(queues, expected) # Check failed 101 != 100 not ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0 not ok 1 selftests: drivers/net: queues.py # exit=1 With the fix: # ./ksft-net-drv/run_kselftest.sh -t drivers/net:queues.py timeout set to 45 selftests: drivers/net: queues.py KTAP version 1 1..2 ok 1 queues.get_queues ok 2 queues.addremove_queues # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0 Fixes: 6b6171db7fc8 ("netdev-genl: Add netlink framework functions for queue") Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241213152244.3080955-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16ceph: allocate sparse_ext map only for sparse readsIlya Dryomov
If mounted with sparseread option, ceph_direct_read_write() ends up making an unnecessarily allocation for O_DIRECT writes. Fixes: 03bc06c7b0bd ("ceph: add new mount option to enable sparse reads") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Markuze <amarkuze@redhat.com>
2024-12-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfAlexei Starovoitov
Cross-merge bpf fixes after downstream PR. No conflicts. Adjacent changes in: Auto-merging include/linux/bpf.h Auto-merging include/linux/bpf_verifier.h Auto-merging kernel/bpf/btf.c Auto-merging kernel/bpf/verifier.c Auto-merging kernel/trace/bpf_trace.c Auto-merging tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-12-16net: ethtool: Add support for tsconfig command to get/set hwtstamp configKory Maincent
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket to read and configure hwtstamp configuration of a PHC provider. Note that simultaneous hwtstamp isn't supported; configuring a new one disables the previous setting. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topologyKory Maincent
Either the MAC or the PHY can provide hwtstamp, so we should be able to read the tsinfo for any hwtstamp provider. Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a network topology. Add support for a specific dump command to retrieve all hwtstamp providers within the network topology, with added functionality for filtered dump to target a single interface. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Add the possibility to support a selected hwtstamp in netdeviceKory Maincent
Introduce the description of a hwtstamp provider, mainly defined with a the hwtstamp source and the phydev pointer. Add a hwtstamp provider description within the netdev structure to allow saving the hwtstamp we want to use. This prepares for future support of an ethtool netlink command to select the desired hwtstamp provider. By default, the old API that does not support hwtstamp selectability is used, meaning the hwtstamp provider pointer is unset. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Make net_hwtstamp_validate accessibleKory Maincent
Make the net_hwtstamp_validate function accessible in prevision to use it from ethtool to validate the hwtstamp configuration before setting it. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Make dev_get_hwtstamp_phylib accessibleKory Maincent
Make the dev_get_hwtstamp_phylib function accessible in prevision to use it from ethtool to read the hwtstamp current configuration. Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: add counters for rekeySabrina Dubroca
This introduces 5 counters to keep track of key updates: Tls{Rx,Tx}Rekey{Ok,Error} and TlsRxRekeyReceived. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: implement rekey for TLS1.3Sabrina Dubroca
This adds the possibility to change the key and IV when using TLS1.3. Changing the cipher or TLS version is not supported. Once we have updated the RX key, we can unblock the receive side. If the rekey fails, the context is unmodified and userspace is free to retry the update or close the socket. This change only affects tls_sw, since 1.3 offload isn't supported. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: block decryption when a rekey is pendingSabrina Dubroca
When a TLS handshake record carrying a KeyUpdate message is received, all subsequent records will be encrypted with a new key. We need to stop decrypting incoming records with the old key, and wait until userspace provides a new key. Make a note of this in the RX context just after decrypting that record, and stop recvmsg/splice calls with EKEYEXPIRED until the new key is available. key_update_pending can't be combined with the existing bitfield, because we will read it locklessly in ->poll. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15mptcp: drop useless "err = 0" in subflow_destroyGeliang Tang
Upon successful return, mptcp_pm_parse_addr() returns 0. There is no need to set "err = 0" after this. So after mptcp_nl_find_ssk() returns, just need to set "err = -ESRCH", then release and free msk socket if it returns NULL. Also, no need to define the variable "subflow" in subflow_destroy(), use mptcp_subflow_ctx(ssk) directly. This patch doesn't change the behaviour of the code, just refactoring. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-7-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: change local addr type of subflow_destroyGeliang Tang
Generally, in the path manager interfaces, the local address is defined as an mptcp_pm_addr_entry type address, while the remote address is defined as an mptcp_addr_info type one: (struct mptcp_pm_addr_entry *local, struct mptcp_addr_info *remote) But subflow_destroy() interface uses two mptcp_addr_info type parameters. This patch changes the first one to mptcp_pm_addr_entry type and use helper mptcp_pm_parse_entry() to parse it instead of using mptcp_pm_parse_addr(). This patch doesn't change the behaviour of the code, just refactoring. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-6-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: drop free_list for deleting entriesGeliang Tang
mptcp_pm_remove_addrs() actually only deletes one address, which does not match its name. This patch renames it to mptcp_pm_remove_addr_entry() and changes the parameter "rm_list" to "entry". With the help of mptcp_pm_remove_addr_entry(), it's no longer necessary to move the entry to be deleted to free_list and then traverse the list to delete the entry, which is not allowed in BPF. The entry can be directly deleted through list_del_rcu() and sock_kfree_s() now. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-5-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: move mptcp_pm_remove_addrs into pm_userspaceGeliang Tang
Since mptcp_pm_remove_addrs() is only called from the userspace PM, this patch moves it into pm_userspace.c. For this, lookup_subflow_by_saddr() and remove_anno_list_by_saddr() helpers need to be exported in protocol.h. Also add "mptcp_" prefix for these helpers. Here, mptcp_pm_remove_addrs() is not changed to a static function because it will be used in BPF Path Manager. This patch doesn't change the behaviour of the code, just refactoring. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-4-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: add mptcp_userspace_pm_get_sock helperGeliang Tang
Each userspace pm netlink function uses nla_get_u32() to get the msk token value, then pass it to mptcp_token_get_sock() to get the msk. Finally check whether userspace PM is selected on this msk. It makes sense to wrap them into a helper, named mptcp_userspace_pm_get_sock(), to do this. This patch doesn't change the behaviour of the code, just refactoring. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-3-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: add mptcp_for_each_userspace_pm_addr macroGeliang Tang
Similar to mptcp_for_each_subflow() macro, this patch adds a new macro mptcp_for_each_userspace_pm_addr() for userspace PM to iterate over the address entries on the local address list userspace_pm_local_addr_list of the mptcp socket. This patch doesn't change the behaviour of the code, just refactoring. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-2-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15mptcp: add mptcp_userspace_pm_lookup_addr helperGeliang Tang
Like __lookup_addr() helper in pm_netlink.c, a new helper mptcp_userspace_pm_lookup_addr() is also defined in pm_userspace.c. It looks up the corresponding mptcp_pm_addr_entry address in userspace_pm_local_addr_list through the passed "addr" parameter and returns the found address entry. This helper can be used in mptcp_userspace_pm_delete_local_addr(), mptcp_userspace_pm_set_flags(), mptcp_userspace_pm_get_local_id() and mptcp_userspace_pm_is_backup() to simplify the code. Please note that with this change now list_for_each_entry() is used in mptcp_userspace_pm_append_new_local_addr(), not list_for_each_entry_safe(), but that's OK to do so because mptcp_userspace_pm_lookup_addr() only returns an entry from the list, the list hasn't been modified here. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241213-net-next-mptcp-pm-misc-cleanup-v1-1-ddb6d00109a8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15ipv4: output metric as unsigned intMaximilian Güntner
adding a route metric greater than 0x7fff_ffff leads to an unintended wrap when printing the underlying u32 as an unsigned int (`%d`) thus incorrectly rendering the metric as negative. Formatting using `%u` corrects the issue. Signed-off-by: Maximilian Güntner <code@mguentner.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241212161911.51598-1-code@mguentner.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-15net/smc: check return value of sock_recvmsg when draining clc dataGuangguan Wang
When receiving clc msg, the field length in smc_clc_msg_hdr indicates the length of msg should be received from network and the value should not be fully trusted as it is from the network. Once the value of length exceeds the value of buflen in function smc_clc_wait_msg it may run into deadloop when trying to drain the remaining data exceeding buflen. This patch checks the return value of sock_recvmsg when draining data in case of deadloop in draining. Fixes: fb4f79264c0f ("net/smc: tolerate future SMCD versions") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15net/smc: check smcd_v2_ext_offset when receiving proposal msgGuangguan Wang
When receiving proposal msg in server, the field smcd_v2_ext_offset in proposal msg is from the remote client and can not be fully trusted. Once the value of smcd_v2_ext_offset exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks the value of smcd_v2_ext_offset before using it. Fixes: 5c21c4ccafe8 ("net/smc: determine accepted ISM devices") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15net/smc: check v2_ext_offset/eid_cnt/ism_gid_cnt when receiving proposal msgGuangguan Wang
When receiving proposal msg in server, the fields v2_ext_offset/ eid_cnt/ism_gid_cnt in proposal msg are from the remote client and can not be fully trusted. Especially the field v2_ext_offset, once exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks the fields v2_ext_offset/eid_cnt/ism_gid_cnt before using them. Fixes: 8c3dca341aea ("net/smc: build and send V2 CLC proposal") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15net/smc: check iparea_offset and ipv6_prefixes_cnt when receiving proposal msgGuangguan Wang
When receiving proposal msg in server, the field iparea_offset and the field ipv6_prefixes_cnt in proposal msg are from the remote client and can not be fully trusted. Especially the field iparea_offset, once exceed the max value, there has the chance to access wrong address, and crash may happen. This patch checks iparea_offset and ipv6_prefixes_cnt before using them. Fixes: e7b7a64a8493 ("smc: support variable CLC proposal messages") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15net/smc: check sndbuf_space again after NOSPACE flag is set in smc_pollGuangguan Wang
When application sending data more than sndbuf_space, there have chances application will sleep in epoll_wait, and will never be wakeup again. This is caused by a race between smc_poll and smc_cdc_tx_handler. application tasklet smc_tx_sendmsg(len > sndbuf_space) | epoll_wait for EPOLL_OUT,timeout=0 | smc_poll | if (!smc->conn.sndbuf_space) | | smc_cdc_tx_handler | atomic_add sndbuf_space | smc_tx_sndbuf_nonfull | if (!test_bit SOCK_NOSPACE) | do not sk_write_space; set_bit SOCK_NOSPACE; | return mask=0; | Application will sleep in epoll_wait as smc_poll returns 0. And smc_cdc_tx_handler will not call sk_write_space because the SOCK_NOSPACE has not be set. If there is no inflight cdc msg, sk_write_space will not be called any more, and application will sleep in epoll_wait forever. So check sndbuf_space again after NOSPACE flag is set to break the race. Fixes: 8dce2786a290 ("net/smc: smc_poll improvements") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15net/smc: protect link down work from execute after lgr freedGuangguan Wang
link down work may be scheduled before lgr freed but execute after lgr freed, which may result in crash. So it is need to hold a reference before shedule link down work, and put the reference after work executed or canceled. The relevant crash call stack as follows: list_del corruption. prev->next should be ffffb638c9c0fe20, but was 0000000000000000 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:51! invalid opcode: 0000 [#1] SMP NOPTI CPU: 6 PID: 978112 Comm: kworker/6:119 Kdump: loaded Tainted: G #1 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 2221b89 04/01/2014 Workqueue: events smc_link_down_work [smc] RIP: 0010:__list_del_entry_valid.cold+0x31/0x47 RSP: 0018:ffffb638c9c0fdd8 EFLAGS: 00010086 RAX: 0000000000000054 RBX: ffff942fb75e5128 RCX: 0000000000000000 RDX: ffff943520930aa0 RSI: ffff94352091fc80 RDI: ffff94352091fc80 RBP: 0000000000000000 R08: 0000000000000000 R09: ffffb638c9c0fc38 R10: ffffb638c9c0fc30 R11: ffffffffa015eb28 R12: 0000000000000002 R13: ffffb638c9c0fe20 R14: 0000000000000001 R15: ffff942f9cd051c0 FS: 0000000000000000(0000) GS:ffff943520900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f25214000 CR3: 000000025fbae004 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: rwsem_down_write_slowpath+0x17e/0x470 smc_link_down_work+0x3c/0x60 [smc] process_one_work+0x1ac/0x350 worker_thread+0x49/0x2f0 ? rescuer_thread+0x360/0x360 kthread+0x118/0x140 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x1f/0x30 Fixes: 541afa10c126 ("net/smc: add smcr_port_err() and smcr_link_down() processing") Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15netlink: add IGMP/MLD join/leave notificationsYuyang Huang
This change introduces netlink notifications for multicast address changes. The following features are included: * Addition and deletion of multicast addresses are reported using RTM_NEWMULTICAST and RTM_DELMULTICAST messages with AF_INET and AF_INET6. * Two new notification groups: RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR are introduced for receiving these events. This change allows user space applications (e.g., ip monitor) to efficiently track multicast group memberships by listening for netlink events. Previously, applications relied on inefficient polling of procfs, introducing delays. With netlink notifications, applications receive realtime updates on multicast group membership changes, enabling more precise metrics collection and system monitoring.  This change also unlocks the potential for implementing a wide range of sophisticated multicast related features in user space by allowing applications to combine kernel provided multicast address information with user space data and communicate decisions back to the kernel for more fine grained control. This mechanism can be used for various purposes, including multicast filtering, IGMP/MLD offload, and IGMP/MLD snooping. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Co-developed-by: Patrick Ruddy <pruddy@vyatta.att-mail.com> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com> Link: https://lore.kernel.org/r/20180906091056.21109-1-pruddy@vyatta.att-mail.com Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-14Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix a bug in the BPF verifier to track changes to packet data property for global functions (Eduard Zingerman) - Fix a theoretical BPF prog_array use-after-free in RCU handling of __uprobe_perf_func (Jann Horn) - Fix BPF tracing to have an explicit list of tracepoints and their arguments which need to be annotated as PTR_MAYBE_NULL (Kumar Kartikeya Dwivedi) - Fix a logic bug in the bpf_remove_insns code where a potential error would have been wrongly propagated (Anton Protopopov) - Avoid deadlock scenarios caused by nested kprobe and fentry BPF programs (Priya Bala Govindasamy) - Fix a bug in BPF verifier which was missing a size check for BTF-based context access (Kumar Kartikeya Dwivedi) - Fix a crash found by syzbot through an invalid BPF prog_array access in perf_event_detach_bpf_prog (Jiri Olsa) - Fix several BPF sockmap bugs including a race causing a refcount imbalance upon element replace (Michal Luczaj) - Fix a use-after-free from mismatching BPF program/attachment RCU flavors (Jann Horn) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits) bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs selftests/bpf: Add tests for raw_tp NULL args bpf: Augment raw_tp arguments with PTR_MAYBE_NULL bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL" selftests/bpf: Add test for narrow ctx load for pointer args bpf: Check size for BTF-based ctx access of pointer members selftests/bpf: extend changes_pkt_data with cases w/o subprograms bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs bpf: Fix theoretical prog_array UAF in __uprobe_perf_func() bpf: fix potential error return selftests/bpf: validate that tail call invalidates packet pointers bpf: consider that tail calls invalidate packet pointers selftests/bpf: freplace tests for tracking of changes_packet_data bpf: check changes_pkt_data property for extension programs selftests/bpf: test for changing packet data from global functions bpf: track changes_pkt_data property for global functions bpf: refactor bpf_helper_changes_pkt_data to use helper number bpf: add find_containing_subprog() utility function bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors ...
2024-12-12page_pool: disable sync for cpu for dmabuf memory providerMina Almasry
dmabuf dma-addresses should not be dma_sync'd for CPU/device. Typically its the driver responsibility to dma_sync for CPU, but the driver should not dma_sync for CPU if the netmem is actually coming from a dmabuf memory provider. The page_pool already exposes a helper for dma_sync_for_cpu: page_pool_dma_sync_for_cpu. Upgrade this existing helper to handle netmem, and have it skip dma_sync if the memory is from a dmabuf memory provider. Drivers should migrate to using this helper when adding support for netmem. Also minimize the impact on the dma syncing performance for pages. Special case the dma-sync path for pages to not go through the overhead checks for dma-syncing and conversion to netmem. Cc: Alexander Lobakin <aleksander.lobakin@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20241211212033.1684197-5-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12page_pool: Set `dma_sync` to false for devmem memory providerSamiullah Khawaja
Move the `dma_map` and `dma_sync` checks to `page_pool_init` to make them generic. Set dma_sync to false for devmem memory provider because the dma_sync APIs should not be used for dma_buf backed devmem memory provider. Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Samiullah Khawaja <skhawaja@google.com> Signed-off-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20241211212033.1684197-4-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12net: page_pool: rename page_pool_alloc_netmem to *_netmemsMina Almasry
page_pool_alloc_netmem (without an s) was the mirror of page_pool_alloc_pages (with an s), which was confusing. Rename to page_pool_alloc_netmems so it's the mirror of page_pool_alloc_pages. Signed-off-by: Mina Almasry <almasrymina@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20241211212033.1684197-2-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12xdp: make __xdp_return() MP-agnosticAlexander Lobakin
Currently, __xdp_return() takes pointer to the virtual memory to free a buffer. Apart from that this sometimes provokes redundant data <--> page conversions, taking data pointer effectively prevents lots of XDP code to support non-page-backed buffers, as there's no mapping for the non-host memory (data is always NULL). Just convert it to always take netmem reference. For xdp_return_{buff,frame*}(), this chops off one page_address() per each frag and adds one virt_to_netmem() (same as virt_to_page()) per header buffer. For __xdp_return() itself, it removes one virt_to_page() for MEM_TYPE_PAGE_POOL and another one for MEM_TYPE_PAGE_ORDER0, adding one page_address() for [not really common nowadays] MEM_TYPE_PAGE_SHARED, but the main effect is that the abovementioned functions won't die or memleak anymore if the frame has non-host memory attached and will correctly free those. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20241211172649.761483-4-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12xdp: get rid of xdp_frame::mem.idAlexander Lobakin
Initially, xdp_frame::mem.id was used to search for the corresponding &page_pool to return the page correctly. However, after that struct page was extended to have a direct pointer to its PP (netmem has it as well), further keeping of this field makes no sense. xdp_return_frame_bulk() still used it to do a lookup, and this leftover is now removed. Remove xdp_frame::mem and replace it with ::mem_type, as only memory type still matters and we need to know it to be able to free the frame correctly. As a cute side effect, we can now make every scalar field in &xdp_frame of 4 byte width, speeding up accesses to them. Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20241211172649.761483-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12page_pool: allow mixing PPs within one bulkAlexander Lobakin
The main reason for this change was to allow mixing pages from different &page_pools within one &xdp_buff/&xdp_frame. Why not? With stuff like devmem and io_uring zerocopy Rx, it's required to have separate PPs for header buffers and payload buffers. Adjust xdp_return_frame_bulk() and page_pool_put_netmem_bulk(), so that they won't be tied to a particular pool. Let the latter create a separate bulk of pages which's PP is different from the first netmem of the bulk and process it after the main loop. This greatly optimizes xdp_return_frame_bulk(): no more hashtable lookups and forced flushes on PP mismatch. Also make xdp_flush_frame_bulk() inline, as it's just one if + function call + one u32 read, not worth extending the call ladder. Co-developed-by: Toke Høiland-Jørgensen <toke@redhat.com> # iterative Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> # while (count) Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/20241211172649.761483-2-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12net_sched: sch_cake: Add drop reasonsToke Høiland-Jørgensen
Add three qdisc-specific drop reasons and use them in sch_cake: 1) SKB_DROP_REASON_QDISC_OVERLIMIT Whenever the total queue limit for a qdisc instance is exceeded and a packet is dropped to make room. 2) SKB_DROP_REASON_QDISC_CONGESTED Whenever a packet is dropped by the qdisc AQM algorithm because congestion is detected. 3) SKB_DROP_REASON_CAKE_FLOOD Whenever a packet is dropped by the flood protection part of the CAKE AQM algorithm (BLUE). Also use the existing SKB_DROP_REASON_QUEUE_PURGE in cake_clear_tin(). Reasons show up as: perf record -a -e skb:kfree_skb sleep 1; perf script iperf3 665 [005] 848.656964: skb:kfree_skb: skbaddr=0xffff98168a333500 rx_sk=(nil) protocol=34525 location=__dev_queue_xmit+0x10f0 reason: QDISC_OVERLIMIT swapper 0 [001] 909.166055: skb:kfree_skb: skbaddr=0xffff98168280cee0 rx_sk=(nil) protocol=34525 location=cake_dequeue+0x5ef reason: QDISC_CONGESTED Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Dave Taht <dave.taht@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20241211-cake-drop-reason-v2-1-920afadf4d1b@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12Merge tag 'net-6.13-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, netfilter and wireless. Current release - fix to a fix: - rtnetlink: fix error code in rtnl_newlink() - tipc: fix NULL deref in cleanup_bearer() Current release - regressions: - ip: fix warning about invalid return from in ip_route_input_rcu() Current release - new code bugs: - udp: fix L4 hash after reconnect - eth: lan969x: fix cyclic dependency between modules - eth: bnxt_en: fix potential crash when dumping FW log coredump Previous releases - regressions: - wifi: mac80211: - fix a queue stall in certain cases of channel switch - wake the queues in case of failure in resume - splice: do not checksum AF_UNIX sockets - virtio_net: fix BUG()s in BQL support due to incorrect accounting of purged packets during interface stop - eth: - stmmac: fix TSO DMA API mis-usage causing oops - bnxt_en: fixes for HW GRO: GSO type on 5750X chips and oops due to incorrect aggregation ID mask on 5760X chips Previous releases - always broken: - Bluetooth: improve setsockopt() handling of malformed user input - eth: ocelot: fix PTP timestamping in presence of packet loss - ptp: kvm: x86: avoid "fail to initialize ptp_kvm" when simply not supported" * tag 'net-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits) net: dsa: tag_ocelot_8021q: fix broken reception net: dsa: microchip: KSZ9896 register regmap alignment to 32 bit boundaries net: renesas: rswitch: fix initial MPIC register setting Bluetooth: btmtk: avoid UAF in btmtk_process_coredump Bluetooth: iso: Fix circular lock in iso_conn_big_sync Bluetooth: iso: Fix circular lock in iso_listen_bis Bluetooth: SCO: Add support for 16 bits transparent voice setting Bluetooth: iso: Fix recursive locking warning Bluetooth: iso: Always release hdev at the end of iso_listen_bis Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating Bluetooth: hci_core: Fix sleeping function called from invalid context team: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL team: Fix initial vlan_feature set in __team_compute_features bonding: Fix feature propagation of NETIF_F_GSO_ENCAP_ALL bonding: Fix initial {vlan,mpls}_feature set in bond_compute_features net, team, bonding: Add netdev_base_features helper net/sched: netem: account for backlog updates from child qdisc net: dsa: felix: fix stuck CPU-injected packets with short taprio windows splice: do not checksum AF_UNIX sockets net: usb: qmi_wwan: add Telit FE910C04 compositions ...
2024-12-12Merge tag 'for-net-2024-12-12' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - SCO: Fix transparent voice setting - ISO: Locking fixes - hci_core: Fix sleeping function called from invalid context - hci_event: Fix using rcu_read_(un)lock while iterating - btmtk: avoid UAF in btmtk_process_coredump * tag 'for-net-2024-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: btmtk: avoid UAF in btmtk_process_coredump Bluetooth: iso: Fix circular lock in iso_conn_big_sync Bluetooth: iso: Fix circular lock in iso_listen_bis Bluetooth: SCO: Add support for 16 bits transparent voice setting Bluetooth: iso: Fix recursive locking warning Bluetooth: iso: Always release hdev at the end of iso_listen_bis Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating Bluetooth: hci_core: Fix sleeping function called from invalid context Bluetooth: Improve setsockopt() handling of malformed user input ==================== Link: https://patch.msgid.link/20241212142806.2046274-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12net: dsa: tag_ocelot_8021q: fix broken receptionRobert Hodaszi
The blamed commit changed the dsa_8021q_rcv() calling convention to accept pre-populated source_port and switch_id arguments. If those are not available, as in the case of tag_ocelot_8021q, the arguments must be pre-initialized with -1. Due to the bug of passing uninitialized arguments in tag_ocelot_8021q, dsa_8021q_rcv() does not detect that it needs to populate the source_port and switch_id, and this makes dsa_conduit_find_user() fail, which leads to packet loss on reception. Fixes: dcfe7673787b ("net: dsa: tag_sja1105: absorb logic for not overwriting precise info into dsa_8021q_rcv()") Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20241211144741.1415758-1-robert.hodaszi@digi.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-12Bluetooth: iso: Fix circular lock in iso_conn_big_syncIulia Tanasescu
This fixes the circular locking dependency warning below, by reworking iso_sock_recvmsg, to ensure that the socket lock is always released before calling a function that locks hdev. [ 561.670344] ====================================================== [ 561.670346] WARNING: possible circular locking dependency detected [ 561.670349] 6.12.0-rc6+ #26 Not tainted [ 561.670351] ------------------------------------------------------ [ 561.670353] iso-tester/3289 is trying to acquire lock: [ 561.670355] ffff88811f600078 (&hdev->lock){+.+.}-{3:3}, at: iso_conn_big_sync+0x73/0x260 [bluetooth] [ 561.670405] but task is already holding lock: [ 561.670407] ffff88815af58258 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}, at: iso_sock_recvmsg+0xbf/0x500 [bluetooth] [ 561.670450] which lock already depends on the new lock. [ 561.670452] the existing dependency chain (in reverse order) is: [ 561.670453] -> #2 (sk_lock-AF_BLUETOOTH){+.+.}-{0:0}: [ 561.670458] lock_acquire+0x7c/0xc0 [ 561.670463] lock_sock_nested+0x3b/0xf0 [ 561.670467] bt_accept_dequeue+0x1a5/0x4d0 [bluetooth] [ 561.670510] iso_sock_accept+0x271/0x830 [bluetooth] [ 561.670547] do_accept+0x3dd/0x610 [ 561.670550] __sys_accept4+0xd8/0x170 [ 561.670553] __x64_sys_accept+0x74/0xc0 [ 561.670556] x64_sys_call+0x17d6/0x25f0 [ 561.670559] do_syscall_64+0x87/0x150 [ 561.670563] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670567] -> #1 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: [ 561.670571] lock_acquire+0x7c/0xc0 [ 561.670574] lock_sock_nested+0x3b/0xf0 [ 561.670577] iso_sock_listen+0x2de/0xf30 [bluetooth] [ 561.670617] __sys_listen_socket+0xef/0x130 [ 561.670620] __x64_sys_listen+0xe1/0x190 [ 561.670623] x64_sys_call+0x2517/0x25f0 [ 561.670626] do_syscall_64+0x87/0x150 [ 561.670629] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670632] -> #0 (&hdev->lock){+.+.}-{3:3}: [ 561.670636] __lock_acquire+0x32ad/0x6ab0 [ 561.670639] lock_acquire.part.0+0x118/0x360 [ 561.670642] lock_acquire+0x7c/0xc0 [ 561.670644] __mutex_lock+0x18d/0x12f0 [ 561.670647] mutex_lock_nested+0x1b/0x30 [ 561.670651] iso_conn_big_sync+0x73/0x260 [bluetooth] [ 561.670687] iso_sock_recvmsg+0x3e9/0x500 [bluetooth] [ 561.670722] sock_recvmsg+0x1d5/0x240 [ 561.670725] sock_read_iter+0x27d/0x470 [ 561.670727] vfs_read+0x9a0/0xd30 [ 561.670731] ksys_read+0x1a8/0x250 [ 561.670733] __x64_sys_read+0x72/0xc0 [ 561.670736] x64_sys_call+0x1b12/0x25f0 [ 561.670738] do_syscall_64+0x87/0x150 [ 561.670741] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 561.670744] other info that might help us debug this: [ 561.670745] Chain exists of: &hdev->lock --> sk_lock-AF_BLUETOOTH-BTPROTO_ISO --> sk_lock-AF_BLUETOOTH [ 561.670751] Possible unsafe locking scenario: [ 561.670753] CPU0 CPU1 [ 561.670754] ---- ---- [ 561.670756] lock(sk_lock-AF_BLUETOOTH); [ 561.670758] lock(sk_lock AF_BLUETOOTH-BTPROTO_ISO); [ 561.670761] lock(sk_lock-AF_BLUETOOTH); [ 561.670764] lock(&hdev->lock); [ 561.670767] *** DEADLOCK *** Fixes: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: iso: Fix circular lock in iso_listen_bisIulia Tanasescu
This fixes the circular locking dependency warning below, by releasing the socket lock before enterning iso_listen_bis, to avoid any potential deadlock with hdev lock. [ 75.307983] ====================================================== [ 75.307984] WARNING: possible circular locking dependency detected [ 75.307985] 6.12.0-rc6+ #22 Not tainted [ 75.307987] ------------------------------------------------------ [ 75.307987] kworker/u81:2/2623 is trying to acquire lock: [ 75.307988] ffff8fde1769da58 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO) at: iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308021] but task is already holding lock: [ 75.308022] ffff8fdd61a10078 (&hdev->lock) at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308053] which lock already depends on the new lock. [ 75.308054] the existing dependency chain (in reverse order) is: [ 75.308055] -> #1 (&hdev->lock){+.+.}-{3:3}: [ 75.308057] __mutex_lock+0xad/0xc50 [ 75.308061] mutex_lock_nested+0x1b/0x30 [ 75.308063] iso_sock_listen+0x143/0x5c0 [bluetooth] [ 75.308085] __sys_listen_socket+0x49/0x60 [ 75.308088] __x64_sys_listen+0x4c/0x90 [ 75.308090] x64_sys_call+0x2517/0x25f0 [ 75.308092] do_syscall_64+0x87/0x150 [ 75.308095] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 75.308098] -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_ISO){+.+.}-{0:0}: [ 75.308100] __lock_acquire+0x155e/0x25f0 [ 75.308103] lock_acquire+0xc9/0x300 [ 75.308105] lock_sock_nested+0x32/0x90 [ 75.308107] iso_connect_cfm+0x253/0x840 [bluetooth] [ 75.308128] hci_connect_cfm+0x6c/0x190 [bluetooth] [ 75.308155] hci_le_per_adv_report_evt+0x27b/0x2f0 [bluetooth] [ 75.308180] hci_le_meta_evt+0xe7/0x200 [bluetooth] [ 75.308206] hci_event_packet+0x21f/0x5c0 [bluetooth] [ 75.308230] hci_rx_work+0x3ae/0xb10 [bluetooth] [ 75.308254] process_one_work+0x212/0x740 [ 75.308256] worker_thread+0x1bd/0x3a0 [ 75.308258] kthread+0xe4/0x120 [ 75.308259] ret_from_fork+0x44/0x70 [ 75.308261] ret_from_fork_asm+0x1a/0x30 [ 75.308263] other info that might help us debug this: [ 75.308264] Possible unsafe locking scenario: [ 75.308264] CPU0 CPU1 [ 75.308265] ---- ---- [ 75.308265] lock(&hdev->lock); [ 75.308267] lock(sk_lock- AF_BLUETOOTH-BTPROTO_ISO); [ 75.308268] lock(&hdev->lock); [ 75.308269] lock(sk_lock-AF_BLUETOOTH-BTPROTO_ISO); [ 75.308270] *** DEADLOCK *** [ 75.308271] 4 locks held by kworker/u81:2/2623: [ 75.308272] #0: ffff8fdd66e52148 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work+0x443/0x740 [ 75.308276] #1: ffffafb488b7fe48 ((work_completion)(&hdev->rx_work)), at: process_one_work+0x1ce/0x740 [ 75.308280] #2: ffff8fdd61a10078 (&hdev->lock){+.+.}-{3:3} at: hci_le_per_adv_report_evt+0x47/0x2f0 [bluetooth] [ 75.308304] #3: ffffffffb6ba4900 (rcu_read_lock){....}-{1:2}, at: hci_connect_cfm+0x29/0x190 [bluetooth] Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: SCO: Add support for 16 bits transparent voice settingFrédéric Danis
The voice setting is used by sco_connect() or sco_conn_defer_accept() after being set by sco_sock_setsockopt(). The PCM part of the voice setting is used for offload mode through PCM chipset port. This commits add support for mSBC 16 bits offloading, i.e. audio data not transported over HCI. The BCM4349B1 supports 16 bits transparent data on its I2S port. If BT_VOICE_TRANSPARENT is used when accepting a SCO connection, this gives only garbage audio while using BT_VOICE_TRANSPARENT_16BIT gives correct audio. This has been tested with connection to iPhone 14 and Samsung S24. Fixes: ad10b1a48754 ("Bluetooth: Add Bluetooth socket voice option") Signed-off-by: Frédéric Danis <frederic.danis@collabora.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: iso: Fix recursive locking warningIulia Tanasescu
This updates iso_sock_accept to use nested locking for the parent socket, to avoid lockdep warnings caused because the parent and child sockets are locked by the same thread: [ 41.585683] ============================================ [ 41.585688] WARNING: possible recursive locking detected [ 41.585694] 6.12.0-rc6+ #22 Not tainted [ 41.585701] -------------------------------------------- [ 41.585705] iso-tester/3139 is trying to acquire lock: [ 41.585711] ffff988b29530a58 (sk_lock-AF_BLUETOOTH) at: bt_accept_dequeue+0xe3/0x280 [bluetooth] [ 41.585905] but task is already holding lock: [ 41.585909] ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth] [ 41.586064] other info that might help us debug this: [ 41.586069] Possible unsafe locking scenario: [ 41.586072] CPU0 [ 41.586076] ---- [ 41.586079] lock(sk_lock-AF_BLUETOOTH); [ 41.586086] lock(sk_lock-AF_BLUETOOTH); [ 41.586093] *** DEADLOCK *** [ 41.586097] May be due to missing lock nesting notation [ 41.586101] 1 lock held by iso-tester/3139: [ 41.586107] #0: ffff988b29533a58 (sk_lock-AF_BLUETOOTH) at: iso_sock_accept+0x61/0x2d0 [bluetooth] Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: iso: Always release hdev at the end of iso_listen_bisIulia Tanasescu
Since hci_get_route holds the device before returning, the hdev should be released with hci_dev_put at the end of iso_listen_bis even if the function returns with an error. Fixes: 02171da6e86a ("Bluetooth: ISO: Add hcon for listening bis sk") Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: hci_event: Fix using rcu_read_(un)lock while iteratingLuiz Augusto von Dentz
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is not safe since for the most part entries fetched this way shall be treated as rcu_dereference: Note that the value returned by rcu_dereference() is valid only within the enclosing RCU read-side critical section [1]_. For example, the following is **not** legal:: rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); x = p->address; /* BUG!!! */ rcu_read_lock(); y = p->data; /* BUG!!! */ rcu_read_unlock(); Fixes: a0bfde167b50 ("Bluetooth: ISO: Add support for connecting multiple BISes") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12Bluetooth: hci_core: Fix sleeping function called from invalid contextLuiz Augusto von Dentz
This reworks hci_cb_list to not use mutex hci_cb_list_lock to avoid bugs like the bellow: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 5070, name: kworker/u9:2 preempt_count: 0, expected: 0 RCU nest depth: 1, expected: 0 4 locks held by kworker/u9:2/5070: #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline] #0: ffff888015be3948 ((wq_completion)hci0#2){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335 #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline] #1: ffffc90003b6fd00 ((work_completion)(&hdev->rx_work)){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335 #2: ffff8880665d0078 (&hdev->lock){+.+.}-{3:3}, at: hci_le_create_big_complete_evt+0xcf/0xae0 net/bluetooth/hci_event.c:6914 #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline] #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline] #3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hci_le_create_big_complete_evt+0xdb/0xae0 net/bluetooth/hci_event.c:6915 CPU: 0 PID: 5070 Comm: kworker/u9:2 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: hci0 hci_rx_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 __might_resched+0x5d4/0x780 kernel/sched/core.c:10187 __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0xc1/0xd70 kernel/locking/mutex.c:752 hci_connect_cfm include/net/bluetooth/hci_core.h:2004 [inline] hci_le_create_big_complete_evt+0x3d9/0xae0 net/bluetooth/hci_event.c:6939 hci_event_func net/bluetooth/hci_event.c:7514 [inline] hci_event_packet+0xa53/0x1540 net/bluetooth/hci_event.c:7569 hci_rx_work+0x3e8/0xca0 net/bluetooth/hci_core.c:4171 process_one_work kernel/workqueue.c:3254 [inline] process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335 worker_thread+0x86d/0xd70 kernel/workqueue.c:3416 kthread+0x2f0/0x390 kernel/kthread.c:388 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> Reported-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com Tested-by: syzbot+2fb0835e0c9cefc34614@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=2fb0835e0c9cefc34614 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-12-12net/smc: support ipv4 mapped ipv6 addr client for smc-r v2Guangguan Wang
AF_INET6 is not supported for smc-r v2 client before, even if the ipv6 addr is ipv4 mapped. Thus, when using AF_INET6, smc-r connection will fallback to tcp, especially for java applications running smc-r. This patch support ipv4 mapped ipv6 addr client for smc-r v2. Clients using real global ipv6 addr is still not supported yet. Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-12net/smc: support SMC-R V2 for rdma devices with max_recv_sge equals to 1Guangguan Wang
For SMC-R V2, llc msg can be larger than SMC_WR_BUF_SIZE, thus every recv wr has 2 sges, the first sge with length SMC_WR_BUF_SIZE is for V1/V2 compatible llc/cdc msg, and the second sge with length SMC_WR_BUF_V2_SIZE-SMC_WR_TX_SIZE is for V2 specific llc msg, like SMC_LLC_DELETE_RKEY and SMC_LLC_ADD_LINK for SMC-R V2. The memory buffer in the second sge is shared by all recv wr in one link and all link in one lgr for saving memory usage purpose. But not all RDMA devices with max_recv_sge greater than 1. Thus SMC-R V2 can not support on such RDMA devices and SMC_CLC_DECL_INTERR fallback happens because of the failure of create qp. This patch introduce the support for SMC-R V2 on RDMA devices with max_recv_sge equals to 1. Every recv wr has only one sge with individual buffer whose size is SMC_WR_BUF_V2_SIZE once the RDMA device's max_recv_sge equals to 1. It may use more memory, but it is better than SMC_CLC_DECL_INTERR fallback. Co-developed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>