summaryrefslogtreecommitdiff
path: root/net/smc/smc_pnet.c
AgeCommit message (Collapse)Author
2025-03-14net/smc: use the correct ndev to find pnetid by pnetid tableGuangguan Wang
When using smc_pnet in SMC, it will only search the pnetid in the base_ndev of the netdev hierarchy(both HW PNETID and User-defined sw pnetid). This may not work for some scenarios when using SMC in container on cloud environment. In container, there have choices of different container network, such as directly using host network, virtual network IPVLAN, veth, etc. Different choices of container network have different netdev hierarchy. Examples of netdev hierarchy show below. (eth0 and eth1 in host below is the netdev directly related to the physical device). _______________________________ | _________________ | | |POD | | | | | | | | eth0_________ | | | |____| |__| | | | | | | | | | | eth1|base_ndev| eth0_______ | | | | | RDMA || | host |_________| |_______|| --------------------------------- netdev hierarchy if directly using host network ________________________________ | _________________ | | |POD __________ | | | | |upper_ndev| | | | |eth0|__________| | | | |_______|_________| | | |lower netdev | | __|______ | | eth1| | eth0_______ | | |base_ndev| | RDMA || | host |_________| |_______|| --------------------------------- netdev hierarchy if using IPVLAN _______________________________ | _____________________ | | |POD _________ | | | | |base_ndev|| | | |eth0(veth)|_________|| | | |____________|________| | | |pairs | | _______|_ | | | | eth0_______ | | veth|base_ndev| | RDMA || | |_________| |_______|| | _________ | | eth1|base_ndev| | | host |_________| | --------------------------------- netdev hierarchy if using veth Due to some reasons, the eth1 in host is not RDMA attached netdevice, pnetid is needed to map the eth1(in host) with RDMA device so that POD can do SMC-R. Because the eth1(in host) is managed by CNI plugin(such as Terway, network management plugin in container environment), and in cloud environment the eth(in host) can dynamically be inserted by CNI when POD create and dynamically be removed by CNI when POD destroy and no POD related to the eth(in host) anymore. It is hard to config the pnetid to the eth1(in host). But it is easy to config the pnetid to the netdevice which can be seen in POD. When do SMC-R, both the container directly using host network and the container using veth network can successfully match the RDMA device, because the configured pnetid netdev is a base_ndev. But the container using IPVLAN can not successfully match the RDMA device and 0x03030000 fallback happens, because the configured pnetid netdev is not a base_ndev. Additionally, if config pnetid to the eth1(in host) also can not work for matching RDMA device when using veth network and doing SMC-R in POD. To resolve the problems list above, this patch extends to search user -defined sw pnetid in the clc handshake ndev when no pnetid can be found in the base_ndev, and the base_ndev take precedence over ndev for backward compatibility. This patch also can unify the pnetid setup of different network choices list above in container(Config user-defined sw pnetid in the netdevice can be seen in POD). Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-07net/smc: Fix lookup of netdev by using ib_device_get_netdev()Wenjia Zhang
The SMC-R variant of the SMC protocol used direct call to function ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device driver to run SMC-R, it failed to find a device, because in mlx5_ib the internal net device management for retrieving net devices was replaced by a common interface ib_device_get_netdev() in commit 8d159eb2117b ("RDMA/mlx5: Use IB set_netdev and get_netdev functions"). Since such direct accesses to the internal net device management is not recommended at all, update the SMC-R code to use proper API ib_device_get_netdev(). Fixes: 54903572c23c ("net/smc: allow pnetid-less configuration") Reported-by: Aswin K <aswin@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20241106082612.57803-1-wenjia@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-15net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetidLi RongQing
pnetid of pi (not newly allocated pe) should be compared Fixes: e888a2e8337c ("net/smc: introduce list of pnetids for Ethernet devices") Reviewed-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Link: https://patch.msgid.link/20241014115321.33234-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-10net/smc: add sysctl for smc_limit_hsD. Wythe
In commit 48b6190a0042 ("net/smc: Limit SMC visits when handshake workqueue congested"), we introduce a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At that time, we believed that controlling the feature through netlink was sufficient. However, most people have realized now that netlink is not convenient in container scenarios, and sysctl is a more suitable approach. In addition, since commit 462791bbfa35 ("net/smc: add sysctl interface for SMC") had introcuded smc_sysctl_net_init(), it is reasonable for us to initialize limit_smc_hs in it instead of initializing it in smc_pnet_net_int(). Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Jan Karcher <jaka@linux.ibm.com> Link: https://patch.msgid.link/1725590135-5631-1-git-send-email-alibuda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-05net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list()Eric Dumazet
Many syzbot reports show extreme rtnl pressure, and many of them hint that smc acquires rtnl in netns creation for no good reason [1] This patch returns early from smc_pnet_net_init() if there is no netdevice yet. I am not even sure why smc_pnet_create_pnetids_list() even exists, because smc_pnet_netdev_event() is also calling smc_pnet_add_base_pnetid() when handling NETDEV_UP event. [1] extract of typical syzbot reports 2 locks held by syz-executor.3/12252: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12253: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12257: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12261: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.0/12265: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.3/12268: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.4/12271: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.1/12274: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 2 locks held by syz-executor.2/12280: #0: ffffffff8f369610 (pernet_ops_rwsem){++++}-{3:3}, at: copy_net_ns+0x4c7/0x7b0 net/core/net_namespace.c:491 #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_create_pnetids_list net/smc/smc_pnet.c:809 [inline] #1: ffffffff8f375b88 (rtnl_mutex){+.+.}-{3:3}, at: smc_pnet_net_init+0x10a/0x1e0 net/smc/smc_pnet.c:878 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Wenjia Zhang <wenjia@linux.ibm.com> Cc: Jan Karcher <jaka@linux.ibm.com> Cc: "D. Wythe" <alibuda@linux.alibaba.com> Cc: Tony Lu <tonylu@linux.alibaba.com> Cc: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Link: https://lore.kernel.org/r/20240302100744.3868021-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-26net/smc: compatible with 128-bits extended GID of virtual ISM deviceWen Gu
According to virtual ISM support feature defined by SMCv2.1, GIDs of virtual ISM device are UUIDs defined by RFC4122, which are 128-bits long. So some adaptation work is required. And note that the GIDs of existing platform firmware ISM devices still remain 64-bits long. Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Reviewed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25net/smc: De-tangle ism and smc device initializationStefan Raspl
The struct device for ISM devices was part of struct smcd_dev. Move to struct ism_dev, provide a new API call in struct smcd_ops, and convert existing SMCD code accordingly. Furthermore, remove struct smcd_dev from struct ism_dev. This is the final part of a bigger overhaul of the interfaces between SMC and ISM. Signed-off-by: Stefan Raspl <raspl@linux.ibm.com> Signed-off-by: Jan Karcher <jaka@linux.ibm.com> Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-29genetlink: start to validate reserved header bytesJakub Kicinski
We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-09net: rename reference+tracking helpersJakub Kicinski
Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-11net/smc: Fix NULL pointer dereference in smc_pnet_find_ib()Karsten Graul
dev_name() was called with dev.parent as argument but without to NULL-check it before. Solve this by checking the pointer before the call to dev_name(). Fixes: af5f60c7e3d5 ("net/smc: allow PCI IDs as ib device names in the pnet table") Reported-by: syzbot+03e3e228510223dabd34@syzkaller.appspotmail.com Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
tools/testing/selftests/net/mptcp/mptcp_join.sh 34aa6e3bccd8 ("selftests: mptcp: add ip mptcp wrappers") 857898eb4b28 ("selftests: mptcp: add missing join check") 6ef84b1517e0 ("selftests: mptcp: more robust signal race test") https://lore.kernel.org/all/20220221131842.468893-1-broonie@kernel.org/ drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.h drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/ct.c fb7e76ea3f3b6 ("net/mlx5e: TC, Skip redundant ct clear actions") c63741b426e11 ("net/mlx5e: Fix MPLSoUDP encap to use MPLS action information") 09bf97923224f ("net/mlx5e: TC, Move pedit_headers_action to parse_attr") 84ba8062e383 ("net/mlx5e: Test CT and SAMPLE on flow attr") efe6f961cd2e ("net/mlx5e: CT, Don't set flow flag CT for ct clear flow") 3b49a7edec1d ("net/mlx5e: TC, Reject rules with multiple CT actions") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net/smc: Use a mutex for locking "struct smc_pnettable"Fabio M. De Francesco
smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib() which, in turn, calls mutex_lock(&smc_ib_devices.mutex). read_lock() disables preemption. Therefore, the code acquires a mutex while in atomic context and it leads to a SAC bug. Fix this bug by replacing the rwlock with a mutex. Reported-and-tested-by: syzbot+4f322a6d84e991c38775@syzkaller.appspotmail.com Fixes: 64e28b52c7a6 ("net/smc: add pnet table namespace support") Confirmed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20220223100252.22562-1-fmdefrancesco@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-11net/smc: Add global configure for handshake limitation by netlinkD. Wythe
Although we can control SMC handshake limitation through socket options, which means that applications who need it must modify their code. It's quite troublesome for many existing applications. This patch modifies the global default value of SMC handshake limitation through netlink, providing a way to put constraint on handshake without modifies any code for applications. Suggested-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-07net/smc: use GFP_ATOMIC allocation in smc_pnet_add_eth()Eric Dumazet
My last patch moved the netdev_tracker_alloc() call to a section protected by a write_lock(). I should have replaced GFP_KERNEL with GFP_ATOMIC to avoid the infamous: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:256 Fixes: 28f922213886 ("net/smc: fix ref_tracker issue in smc_pnet_add()") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-06net/smc: fix ref_tracker issue in smc_pnet_add()Eric Dumazet
I added the netdev_tracker_alloc() right after ndev was stored into the newly allocated object: new_pe->ndev = ndev; if (ndev) netdev_tracker_alloc(ndev, &new_pe->dev_tracker, GFP_KERNEL); But I missed that later, we could end up freeing new_pe, then calling dev_put(ndev) to release the reference on ndev. The new_pe->dev_tracker would not be freed. To solve this issue, move the netdev_tracker_alloc() call to the point we know for sure new_pe will be kept. syzbot report (on net-next tree, but the bug is present in net tree) WARNING: CPU: 0 PID: 6019 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 6019 Comm: syz-executor.3 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d f4 70 a0 09 31 ff 89 de e8 4d bc 99 fd 84 db 75 e0 e8 64 b8 99 fd 48 c7 c7 20 0c 06 8a c6 05 d4 70 a0 09 01 e8 9e 4e 28 05 <0f> 0b eb c4 e8 48 b8 99 fd 0f b6 1d c3 70 a0 09 31 ff 89 de e8 18 RSP: 0018:ffffc900043b7400 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815fb318 RDI: fffff52000876e72 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815f507e R11: 0000000000000000 R12: 1ffff92000876e85 R13: 0000000000000000 R14: ffff88805c1c6600 R15: 0000000000000000 FS: 00007f1ef6feb700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2d02b000 CR3: 00000000223f4000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] ref_tracker_free+0x53f/0x6c0 lib/ref_tracker.c:119 netdev_tracker_free include/linux/netdevice.h:3867 [inline] dev_put_track include/linux/netdevice.h:3884 [inline] dev_put_track include/linux/netdevice.h:3880 [inline] dev_put include/linux/netdevice.h:3910 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:399 [inline] smc_pnet_enter net/smc/smc_pnet.c:493 [inline] smc_pnet_add+0x5fc/0x15f0 net/smc/smc_pnet.c:556 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: b60645248af3 ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-12net/smc: fix possible NULL deref in smc_pnet_add_eth()Eric Dumazet
I missed that @ndev value can be NULL. I prefer not factorizing this NULL check, and instead clearly document where a NULL might be expected. general protection fault, probably for non-canonical address 0xdffffc00000000ba: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x00000000000005d0-0x00000000000005d7] CPU: 0 PID: 19875 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__lock_acquire+0xd7a/0x5470 kernel/locking/lockdep.c:4897 Code: 14 0e 41 bf 01 00 00 00 0f 86 c8 00 00 00 89 05 5c 20 14 0e e9 bd 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 9f 2e 00 00 49 81 3e 20 c5 1a 8f 0f 84 52 f3 ff RSP: 0018:ffffc900057071d0 EFLAGS: 00010002 RAX: dffffc0000000000 RBX: 1ffff92000ae0e65 RCX: 1ffff92000ae0e4c RDX: 00000000000000ba RSI: 0000000000000000 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001 R10: fffffbfff1b24ae2 R11: 000000000008808a R12: 0000000000000000 R13: ffff888040ca4000 R14: 00000000000005d0 R15: 0000000000000000 FS: 00007fbd683e0700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2be22000 CR3: 0000000013fea000 CR4: 00000000003526f0 Call Trace: <TASK> lock_acquire kernel/locking/lockdep.c:5637 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5602 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162 ref_tracker_alloc+0x182/0x440 lib/ref_tracker.c:84 netdev_tracker_alloc include/linux/netdevice.h:3859 [inline] smc_pnet_add_eth net/smc/smc_pnet.c:372 [inline] smc_pnet_enter net/smc/smc_pnet.c:492 [inline] smc_pnet_add+0x49a/0x14d0 net/smc/smc_pnet.c:555 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 genl_rcv+0x24/0x40 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:725 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413 ___sys_sendmsg+0xf3/0x170 net/socket.c:2467 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: b60645248af3 ("net/smc: add net device tracker to struct smc_pnetentry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-02net/smc: Introduce net namespace support for linkgroupTony Lu
Currently, rdma device supports exclusive net namespace isolation, however linkgroup doesn't know and support ibdev net namespace. Applications in the containers don't want to share the nics if we enabled rdma exclusive mode. Every net namespaces should have their own linkgroups. This patch introduce a new field net for linkgroup, which is standing for the ibdev net namespace in the linkgroup. The net in linkgroup is initialized with the net namespace of link's ibdev. It compares the net of linkgroup and sock or ibdev before choose it, if no matched, create new one in current net namespace. If rdma net namespace exclusive mode is not enabled, it behaves as before. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-07net/smc: add net device tracker to struct smc_pnetentryEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-16net/smc: retrieve v2 gid from IB deviceKarsten Graul
In smc_ib.c, scan for RoCE devices that support UDP encapsulation. Find an eligible device and check that there is a route to the remote peer. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: Remove redundant if statementsYajun Deng
The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-01net/smc: Add diagnostic information to smc ib-deviceGuvenc Gulce
During smc ib-device creation, add network device ifindex to smc ib-device structure. Register for netdevice changes and update ib-device accordingly. This is needed for diagnostic purposes. Signed-off-by: Guvenc Gulce <guvenc@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-09-28net/smc: determine proposed ISM devicesUrsula Braun
SMCD Version 2 allows to propose up to 8 additional ISM devices offered to the peer as candidates for SMCD communication. This patch covers determination of the ISM devices to be proposed. ISM devices without PNETID are preferred, since ISM devices with PNETID are a V1 leftover and will disappear over the time. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net/smc: introduce list of pnetids for Ethernet devicesUrsula Braun
SMCD version 2 allows usage of ISM devices with hardware PNETID only, if an Ethernet net_device exists with the same hardware PNETID. This requires to maintain a list of pnetids belonging to Ethernet net_devices, which is covered by this patch. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net/smc: prepare for more proposed ISM devicesUrsula Braun
SMCD Version 2 allows proposing of up to 8 ISM devices in addition to the native ISM device of SMCD Version 1. This patch prepares the struct smc_init_info to deal with these additional 8 ISM devices. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28net/smc: remove constant and introduce helper to check for a pnet idKarsten Graul
Use the existing symbol _S instead of SMC_ASCII_BLANK, and introduce a helper to check if a pnetid is set. No functional change. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-10net/smc: improve server ISM device determinationUrsula Braun
Move check whether peer can be reached into smc_pnet_find_ism_by_pnetid(). Thus searching continues for another ism device, if check fails. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08net/smc: switch smcd_dev_list spinlock to mutexUrsula Braun
The similar smc_ib_devices spinlock has been converted to a mutex. Protecting the smcd_dev_list by a mutex is possible as well. This patch converts the smcd_dev_list spinlock to a mutex. Fixes: c6ba7c9ba43d ("net/smc: add base infrastructure for SMC-D and ISM") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08net/smc: fix sleep bug in smc_pnet_find_roce_resource()Ursula Braun
Tests showed this BUG: [572555.252867] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935 [572555.252876] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 131031, name: smcapp [572555.252879] INFO: lockdep is turned off. [572555.252883] CPU: 1 PID: 131031 Comm: smcapp Tainted: G O 5.7.0-rc3uschi+ #356 [572555.252885] Hardware name: IBM 3906 M03 703 (LPAR) [572555.252887] Call Trace: [572555.252896] [<00000000ac364554>] show_stack+0x94/0xe8 [572555.252901] [<00000000aca1f400>] dump_stack+0xa0/0xe0 [572555.252906] [<00000000ac3c8c10>] ___might_sleep+0x260/0x280 [572555.252910] [<00000000acdc0c98>] __mutex_lock+0x48/0x940 [572555.252912] [<00000000acdc15c2>] mutex_lock_nested+0x32/0x40 [572555.252975] [<000003ff801762d0>] mlx5_lag_get_roce_netdev+0x30/0xc0 [mlx5_core] [572555.252996] [<000003ff801fb3aa>] mlx5_ib_get_netdev+0x3a/0xe0 [mlx5_ib] [572555.253007] [<000003ff80063848>] smc_pnet_find_roce_resource+0x1d8/0x310 [smc] [572555.253011] [<000003ff800602f0>] __smc_connect+0x1f0/0x3e0 [smc] [572555.253015] [<000003ff80060634>] smc_connect+0x154/0x190 [smc] [572555.253022] [<00000000acbed8d4>] __sys_connect+0x94/0xd0 [572555.253025] [<00000000acbef620>] __s390x_sys_socketcall+0x170/0x360 [572555.253028] [<00000000acdc6800>] system_call+0x298/0x2b8 [572555.253030] INFO: lockdep is turned off. Function smc_pnet_find_rdma_dev() might be called from smc_pnet_find_roce_resource(). It holds the smc_ib_devices list spinlock while calling infiniband op get_netdev(). At least for mlx5 the get_netdev operation wants mutex serialization, which conflicts with the smc_ib_devices spinlock. This patch switches the smc_ib_devices spinlock into a mutex to allow sleeping when calling get_netdev(). Fixes: a4cf0443c414 ("smc: introduce SMC as an IB-client") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-26net/smc: mark smc_pnet_policy as constDmitry Vyukov
Netlink policies are generally declared as const. This is safer and prevents potential bugs. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-05net/smc: log important pnetid and state change eventsKarsten Graul
Print to system log when SMC links are available or go down, link group state changes or pnetids are applied to and removed from devices. The log entries are triggered by either user configuration actions or adapter activation/deactivation events and are not expected to happen often. The entries help SMC users to keep track of the SMC link group status and to detect when actions are needed (like to add replacements for failed adapters). Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01net/smc: introduce smc_pnet_find_alt_roce()Karsten Graul
Introduce a new function in smc_pnet.c that searches for an alternate IB device, using an existing link group and a primary IB device. The alternate IB device needs to be active and must have the same PNETID as the link group. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-29net/smc: rework pnet table to support SMC-R failoverKarsten Graul
The pnet table stored pnet ids in the smc device structures. When a device is going down its smc device structure is freed, and when the device is brought online again it no longer has a pnet id set. Rework the pnet table implementation to store the device name with their assigned pnet id and apply the pnet id to devices when they are registered. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-21net/smc: allow unprivileged users to read pnet tableHans Wippel
The current flags of the SMC_PNET_GET command only allow privileged users to retrieve entries from the pnet table via netlink. The content of the pnet table may be useful for all users though, e.g., for debugging smc connection problems. This patch removes the GENL_ADMIN_PERM flag so that unprivileged users can read the pnet table. Signed-off-by: Hans Wippel <ndev@hwipl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
One conflict in the BPF samples Makefile, some fixes in 'net' whilst we were converting over to Makefile.target rules in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06net/smc: fix ethernet interface refcountingUrsula Braun
If a pnet table entry is to be added mentioning a valid ethernet interface, but an invalid infiniband or ISM device, the dev_put() operation for the ethernet interface is called twice, resulting in a negative refcount for the ethernet interface, which disables removal of such a network interface. This patch removes one of the dev_put() calls. Fixes: 890a2cb4a966 ("net/smc: rework pnet table") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: remove unnecessary variables and callbackTaehee Yoo
This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-09net/smc: no new connections on disappearing devicesUrsula Braun
Add a "going_away" indication to ISM devices and IB ports and avoid creation of new connections on such disappearing devices. And do not handle ISM events if ISM device is disappearing. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-04-27genetlink: optionally validate strictly/dumpsJohannes Berg
Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12net/smc: consolidate function parametersKarsten Graul
During initialization of an SMC socket a lot of function parameters need to get passed down the function call path. Consolidate the parameters in a helper struct so there are less enough parameters to get all passed by register. Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-11net/smc: fix return code from FLUSH commandKarsten Graul
The FLUSH command is used to empty the pnet table. No return code is expected from the command. Commit a9d8b0b1e3d6 added namespace support for the pnet table and changed the FLUSH command processing to call smc_pnet_remove_by_pnetid() to remove the pnet entries. This function returns -ENOENT when no entry was deleted, which is now the return code of the FLUSH command. As a result the FLUSH command will return an error when the pnet table is already empty. Restore the expected behavior and let FLUSH always return 0. Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support") Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22genetlink: make policy common to familyJohannes Berg
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net/smc: allow pnetid-less configurationUrsula Braun
Without hardware pnetid support there must currently be a pnet table configured to determine the IB device port to be used for SMC RDMA traffic. This patch enables a setup without pnet table, if the used handshake interface belongs already to a RoCE port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net/smc: allow PCI IDs as ib device names in the pnet tableHans Wippel
SMC-D devices are identified by their PCI IDs in the pnet table. In order to make usage of the pnet table more consistent for users, this patch adds this form of identification for ib devices as well. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net/smc: add pnet table namespace supportHans Wippel
This patch adds namespace support to the pnet table code. Each network namespace gets its own pnet table. Infiniband and smcd device pnetids can only be modified in the initial namespace. In other namespaces they can still be used as if they were set by the underlying hardware. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net/smc: add smcd support to the pnet tableHans Wippel
Currently, users can only set pnetids for netdevs and ib devices in the pnet table. This patch adds support for smcd devices to the pnet table. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net/smc: rework pnet tableHans Wippel
If a device does not have a pnetid, users can set a temporary pnetid for said device in the pnet table. This patch reworks the pnet table to make it more flexible. Multiple entries with the same pnetid but differing devices are now allowed. Additionally, the netlink interface now sends each mapping from pnetid to device separately to the user while maintaining the message format existing applications might expect. Also, the SMC data structure for ib devices already has a pnetid attribute. So, it is used to store the user defined pnetids. As a result, the pnet table entries are only used for netdevs. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-01net/smc: allow 16 byte pnetids in netlink policyHans Wippel
Currently, users can only send pnetids with a maximum length of 15 bytes over the SMC netlink interface although the maximum pnetid length is 16 bytes. This patch changes the SMC netlink policy to accept 16 byte pnetids. Signed-off-by: Hans Wippel <hwippel@linux.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-20smc: generic netlink family should be __ro_after_initJohannes Berg
The generic netlink family is only initialized during module init, so it should be __ro_after_init like all other generic netlink families. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>