linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2023-10-13	nfp: replace deprecated strncpy with strscpy	Justin Stitt
	strncpy() is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. We expect res->name to be NUL-terminated based on its usage with format strings: \| dev_err(cpp->dev.parent, "Dangling area: %d:%d:%d:0x%0llx-0x%0llx%s%s\n", \| NFP_CPP_ID_TARGET_of(res->cpp_id), \| NFP_CPP_ID_ACTION_of(res->cpp_id), \| NFP_CPP_ID_TOKEN_of(res->cpp_id), \| res->start, res->end, \| res->name ? " " : "", \| res->name ? res->name : ""); ... and with strcmp() \| if (!strcmp(res->name, NFP_RESOURCE_TBL_NAME)) { Moreover, NUL-padding is not required as `res` is already zero-allocated: \| res = kzalloc(sizeof(*res), GFP_KERNEL); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Let's also opt to use the more idiomatic strscpy() usage of (dest, src, sizeof(dest)) rather than (dest, src, SOME_LEN). Typically the pattern of 1) allocate memory for string, 2) copy string into freshly-allocated memory is a candidate for kmemdup_nul() but in this case we are allocating the entirety of the `res` struct and that should stay as is. As mentioned above, simple 1:1 replacement of strncpy -> strscpy :) Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20231011-strncpy-drivers-net-ethernet-netronome-nfp-nfpcore-nfp_resource-c-v1-1-7d1c984f0eba@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	mlxsw: pci: Allocate skbs using GFP_KERNEL during initialization	Ido Schimmel
	The driver allocates skbs during initialization and during Rx processing. Take advantage of the fact that the former happens in process context and allocate the skbs using GFP_KERNEL to decrease the probability of allocation failure. Tested with CONFIG_DEBUG_ATOMIC_SLEEP=y. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/dfa6ed0926e045fe7c14f0894cc0c37fee81bf9d.1697034729.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	octeontx2-af: Enable hardware timestamping for VFs	Subbaraya Sundeep
	Currently for VFs, mailbox returns ENODEV error when hardware timestamping enable is requested. This patch fixes this issue. Modified this patch to return EPERM error for the PF/VFs which are not attached to CGX/RPM. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com> Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231011121551.1205211-1-saikrishnag@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	Merge branch 'wangxun-ethtool-stats'	Jakub Kicinski
	Jiawen Wu says: ==================== Wangxun ethtool stats Support to show ethtool stats for txgbe/ngbe. ==================== Link: https://lore.kernel.org/r/20231011091906.70486-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	net: ngbe: add ethtool stats support	Jiawen Wu
	Support to show ethtool statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-4-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	net: txgbe: add ethtool stats support	Jiawen Wu
	Support to show ethtool statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-3-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	net: libwx: support hardware statistics	Jiawen Wu
	Implement update and clear Rx/Tx statistics. Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://lore.kernel.org/r/20231011091906.70486-2-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	net: dsa: vsc73xx: replace deprecated strncpy with ethtool_sprintf	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. ethtool_sprintf() is designed specifically for get_strings() usage. Let's replace strncpy in favor of this more robust and easier to understand interface. This change could result in misaligned strings when if(cnt) fails. To combat this, use ternary to place empty string in buffer and properly increment pointer to next string slot. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20231010-strncpy-drivers-net-dsa-vitesse-vsc73xx-core-c-v2-1-ba4416a9ff23@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	Merge branch 'Open-coded task_vma iter'	Andrii Nakryiko
	Dave Marchevsky says: ==================== At Meta we have a profiling daemon which periodically collects information on many hosts. This collection usually involves grabbing stacks (user and kernel) using perf_event BPF progs and later symbolicating them. For user stacks we try to use BPF_F_USER_BUILD_ID and rely on remote symbolication, but BPF_F_USER_BUILD_ID doesn't always succeed. In those cases we must fall back to digging around in /proc/PID/maps to map virtual address to (binary, offset). The /proc/PID/maps digging does not occur synchronously with stack collection, so the process might already be gone, in which case it won't have /proc/PID/maps and we will fail to symbolicate. This 'exited process problem' doesn't occur very often as most of the prod services we care to profile are long-lived daemons, but there are enough usecases to warrant a workaround: a BPF program which can be optionally loaded at data collection time and essentially walks /proc/PID/maps. Currently this is done by walking the vma list: struct vm_area_struct* mmap = BPF_CORE_READ(mm, mmap); mmap_next = BPF_CORE_READ(rmap, vm_next); /* in a loop / Since commit 763ecb035029 ("mm: remove the vma linked list") there's no longer a vma linked list to walk. Walking the vma maple tree is not as simple as hopping struct vm_area_struct->vm_next. Luckily, commit f39af05949a4 ("mm: add VMA iterator"), another commit in that series, added struct vma_iterator and for_each_vma macro for easy vma iteration. If similar functionality was exposed to BPF programs, it would be perfect for our usecase. This series adds such functionality, specifically a BPF equivalent of for_each_vma using the open-coded iterator style. Notes: This approach was chosen after discussion on a previous series [0] which attempted to solve the same problem by adding a BPF_F_VMA_NEXT flag to bpf_find_vma. * Unlike the task_vma bpf_iter, the open-coded iterator kfuncs here do not drop the vma read lock between iterations. See Alexei's response in [0]. * The [vsyscall] page isn't really part of task->mm's vmas, but /proc/PID/maps returns information about it anyways. The vma iter added here does not do the same. See comment on selftest in patch 3. * bpf_iter_task_vma allocates a _data struct which contains - among other things - struct vma_iterator, using BPF allocator and keeps a pointer to the bpf_iter_task_vma_data. This is done in order to prevent changes to struct ma_state - which is wrapped by struct vma_iterator - from necessitating changes to uapi struct bpf_iter_task_vma. Changelog: v6 -> v7: https://lore.kernel.org/bpf/20231010185944.3888849-1-davemarchevsky@fb.com/ Patch numbers correspond to their position in v6 Patch 2 ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c") * Add Andrii ack Patch 3 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Add Andrii ack * Add missing __diag_ignore_all for -Wmissing-prototypes (Song) Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter") * Remove two unnecessary header includes (Andrii) * Remove extraneous !vmas_seen check (Andrii) New Patch ("bpf: Add BPF_KFUNC_{START,END}_defs macros") * After talking to Andrii, this is an attempt to clean up __diag_ignore_all spam everywhere kfuncs are defined. If nontrivial changes are needed, let's apply the other 4 and I'll respin as a standalone patch. v5 -> v6: https://lore.kernel.org/bpf/20231010175637.3405682-1-davemarchevsky@fb.com/ Patch 4 ("selftests/bpf: Add tests for open-coded task_vma iter") * Remove extraneous blank line. I did this manually to the .patch file for v5, which caused BPF CI to complain about failing to apply the series v4 -> v5: https://lore.kernel.org/bpf/20231002195341.2940874-1-davemarchevsky@fb.com/ Patch numbers correspond to their position in v4 New Patch ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c") * Patch 2's renaming of this selftest, and associated changes in the userspace runner, are split out into this separate commit (Andrii) Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Remove bpf_iter_task_vma kfuncs from libbpf's bpf_helpers.h, they'll be added to selftests' bpf_experimental.h in selftests patch below (Andrii) * Split bpf_iter_task_vma.c renaming into separate commit (Andrii) Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter") * Add bpf_iter_task_vma kfuncs to bpf_experimental.h (Andrii) * Remove '?' from prog SEC, open_and_load the skel in one operation (Andrii) * Ensure that fclose() always happens in test runner (Andrii) * Use global var w/ 1000 (vm_start, vm_end) structs instead of two MAP_TYPE_ARRAY's w/ 1k u64s each (Andrii) v3 -> v4: https://lore.kernel.org/bpf/20230822050558.2937659-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num") * Add Andrii ack Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * Mark bpf_iter_task_vma_new args KF_RCU and remove now-unnecessary !task check (Yonghong) * Although KF_RCU is a function-level flag, in reality it only applies to the task_struct task parameter, as the other two params are a scalar int and a specially-handled KF_ARG_PTR_TO_ITER Remove struct bpf_iter_task_vma definition from uapi headers, define in kernel/bpf/task_iter.c instead (Andrii) Patch 3 ("selftests/bpf: Add tests for open-coded task_vma iter") * Use a local var when looping over vmas to track map idx. Update vmas_seen global after done iterating. Don't start iterating or update vmas_seen if vmas_seen global is nonzero. (Andrii) * Move getpgid() call to correct spot - above skel detach. (Andrii) v2 -> v3: https://lore.kernel.org/bpf/20230821173415.1970776-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Don't explicitly emit BTF for struct btf_iter_num") * Add Yonghong ack Patch 2 ("bpf: Introduce task_vma open-coded iterator kfuncs") * UAPI bpf header and tools/ version should match * Add bpf_iter_task_vma_kern_data which bpf_iter_task_vma_kern points to, bpf_mem_alloc/free it instead of just vma_iterator. (Alexei) * Inner data ptr == NULL implies initialization failed v1 -> v2: https://lore.kernel.org/bpf/20230810183513.684836-1-davemarchevsky@fb.com/ * Patch 1 * Now removes the unnecessary BTF_TYPE_EMIT instead of changing the type (Yonghong) * Patch 2 * Don't do unnecessary BTF_TYPE_EMIT (Yonghong) * Bump task refcount to prevent ->mm reuse (Yonghong) * Keep a pointer to vma_iterator in bpf_iter_task_vma, alloc/free via BPF mem allocator (Yonghong, Stanislav) * Patch 3 [0]: https://lore.kernel.org/bpf/20230801145414.418145-1-davemarchevsky@fb.com/ ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-10-13	selftests/bpf: Add tests for open-coded task_vma iter	Dave Marchevsky
	The open-coded task_vma iter added earlier in this series allows for natural iteration over a task's vmas using existing open-coded iter infrastructure, specifically bpf_for_each. This patch adds a test demonstrating this pattern and validating correctness. The vma->vm_start and vma->vm_end addresses of the first 1000 vmas are recorded and compared to /proc/PID/maps output. As expected, both see the same vmas and addresses - with the exception of the [vsyscall] vma - which is explained in a comment in the prog_tests program. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com
2023-10-13	bpf: Introduce task_vma open-coded iterator kfuncs	Dave Marchevsky
	This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task_vma in open-coded iterator style. BPF programs can use these kfuncs directly or through bpf_for_each macro for natural-looking iteration of all task vmas. The implementation borrows heavily from bpf_find_vma helper's locking - differing only in that it holds the mmap_read lock for all iterations while the helper only executes its provided callback on a maximum of 1 vma. Aside from locking, struct vma_iterator and vma_next do all the heavy lifting. A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the only field in struct bpf_iter_task_vma. This is because the inner data struct contains a struct vma_iterator (not ptr), whose size is likely to change under us. If bpf_iter_task_vma_kern contained vma_iterator directly such a change would require change in opaque bpf_iter_task_vma struct's size. So better to allocate vma_iterator using BPF allocator, and since that alloc must already succeed, might as well allocate all iter fields, thereby freezing struct bpf_iter_task_vma size. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com
2023-10-13	selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c	Dave Marchevsky
	Further patches in this series will add a struct bpf_iter_task_vma, which will result in a name collision with the selftest prog renamed in this patch. Rename the selftest to avoid the collision. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com
2023-10-13	bpf: Don't explicitly emit BTF for struct btf_iter_num	Dave Marchevsky
	Commit 6018e1f407cc ("bpf: implement numbers iterator") added the BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num doesn't exist, so only a forward declaration is emitted in BTF: FWD 'btf_iter_num' fwd_kind=struct That commit was probably hoping to ensure that struct bpf_iter_num is emitted in vmlinux BTF. A previous version of this patch changed the line to emit the correct type, but Yonghong confirmed that it would definitely be emitted regardless in [0], so this patch simply removes the line. This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't causing any issues that I noticed, aside from mild confusion when I looked through the code. [0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com
2023-10-13	bpf: Change syscall_nr type to int in struct syscall_tp_t	Artem Savkov
	linux-rt-devel tree contains a patch (b1773eac3f29c ("sched: Add support for lazy preemption")) that adds an extra member to struct trace_entry. This causes the offset of args field in struct trace_event_raw_sys_enter be different from the one in struct syscall_trace_enter: struct trace_event_raw_sys_enter { struct trace_entry ent; /* 0 12 / / XXX last struct has 3 bytes of padding / / XXX 4 bytes hole, try to pack / long int id; / 16 8 / long unsigned int args[6]; / 24 48 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / char __data[]; / 72 0 / / size: 72, cachelines: 2, members: 4 / / sum members: 68, holes: 1, sum holes: 4 / / paddings: 1, sum paddings: 3 / / last cacheline: 8 bytes / }; struct syscall_trace_enter { struct trace_entry ent; / 0 12 / / XXX last struct has 3 bytes of padding / int nr; / 12 4 / long unsigned int args[]; / 16 0 / / size: 16, cachelines: 1, members: 3 / / paddings: 1, sum paddings: 3 / / last cacheline: 16 bytes */ }; This, in turn, causes perf_event_set_bpf_prog() fail while running bpf test_profiler testcase because max_ctx_offset is calculated based on the former struct, while off on the latter: 10488 if (is_tracepoint \|\| is_syscall_tp) { 10489 int off = trace_event_get_offsets(event->tp_event); 10490 10491 if (prog->aux->max_ctx_offset > off) 10492 return -EACCES; 10493 } What bpf program is actually getting is a pointer to struct syscall_tp_t, defined in kernel/trace/trace_syscalls.c. This patch fixes the problem by aligning struct syscall_tp_t with struct syscall_trace_(enter\|exit) and changing the tests to use these structs to dereference context. Signed-off-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20231013054219.172920-1-asavkov@redhat.com
2023-10-13	net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set	Martin KaFai Lau
	It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);". Fixes: fefba7d1ae19 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/
2023-10-13	bpf: Avoid unnecessary audit log for CPU security mitigations	Yafang Shao
	Check cpu_mitigations_off() first to avoid calling capable() if it is off. This can avoid unnecessary audit log. Fixes: bc5bc309db45 ("bpf: Inherit system settings for CPU security mitigations") Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bza6UVUWqcWQ-66weZ-nMDr+TFU3Mtq=dumZFD-pSqU7Ow@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20231013083916.4199-1-laoar.shao@gmail.com
2023-10-13	net: fix IPSTATS_MIB_OUTFORWDATAGRAMS increment after fragment check	Heng Guo
	Reproduce environment: network with 3 VM linuxs is connected as below: VM1<---->VM2(latest kernel 6.5.0-rc7)<---->VM3 VM1: eth0 ip: 192.168.122.207 MTU 1800 VM2: eth0 ip: 192.168.122.208, eth1 ip: 192.168.123.224 MTU 1500 VM3: eth0 ip: 192.168.123.240 MTU 1800 Reproduce: VM1 send 1600 bytes UDP data to VM3 using tools scapy with flags='DF'. scapy command: send(IP(dst="192.168.123.240",flags='DF')/UDP()/str('0'*1600),count=1, inter=1.000000) Result: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 2 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is always keeping 2 without increment. Issue description and patch: ip_exceeds_mtu() in ip_forward() drops this IP datagram because skb len (1600 sending by scapy) is over MTU(1500 in VM2) if "DF" is set. According to RFC 4293 "3.2.3. IP Statistics Tables", +-------+------>------+----->-----+----->-----+ \| InForwDatagrams (6) \| OutForwDatagrams (6) \| \| V +->-+ OutFragReqds \| InNoRoutes \| \| (packets) / (local packet (3) \| \| \| IF is that of the address \| +--> OutFragFails \| and may not be the receiving IF) \| \| (packets) the IPSTATS_MIB_OUTFORWDATAGRAMS should be counted before fragment check. The existing implementation, instead, would incease the counter after fragment check: ip_exceeds_mtu() in ipv4 and ip6_pkt_too_big() in ipv6. So do patch to move IPSTATS_MIB_OUTFORWDATAGRAMS counter to ip_forward() for ipv4 and ip6_forward() for ipv6. Test result with patch: Before IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 6 0 2 2 0 0 2 4 0 0 0 0 0 0 0 0 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- After IP data is sent. ---------------------------------------------------------------------- root@qemux86-64:~# cat /proc/net/snmp Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqdss Ip: 1 64 7 0 2 3 0 0 2 5 0 0 0 0 0 0 0 1 0 ...... root@qemux86-64:~# ---------------------------------------------------------------------- ForwDatagrams is updated from 2 to 3. Reviewed-by: Filip Pudak <filip.pudak@windriver.com> Signed-off-by: Heng Guo <heng.guo@windriver.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231011015137.27262-1-heng.guo@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	Merge branch 'mlx5-next' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== This PR is collected from https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode ==================== Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-13	Merge branch 'tls-cleanups'	David S. Miller
	Sabrina Dubroca says: ==================== net: tls: various code cleanups and improvements This series contains multiple cleanups and simplifications for the config code of both TLS_SW and TLS_HW. It also modifies the chcr_ktls driver to use driver_state like all other drivers, so that we can then make driver_state fixed size instead of a flex array always allocated to that same fixed size. As reported by Gustavo A. R. Silva, the way chcr_ktls misuses driver_state irritates GCC [1]. Patches 1 and 2 are follow-ups to my previous cipher_desc series. [1] https://lore.kernel.org/netdev/ZRvzdlvlbX4+eIln@work/ ==================== Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: use fixed size for tls_offload_context_{tx,rx}.driver_state	Sabrina Dubroca
	driver_state is a flex array, but is always allocated by the tls core to a fixed size (TLS_DRIVER_STATE_SIZE_{TX,RX}). Simplify the code by making that size explicit so that sizeof(struct tls_offload_context_{tx,rx}) works. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	chcr_ktls: use tls_offload_context_tx and driver_state like other drivers	Sabrina Dubroca
	chcr_ktls uses the space reserved in driver_state by tls_set_device_offload, but makes up into own wrapper around tls_offload_context_tx instead of accessing driver_state via the __tls_driver_ctx helper. In this driver, driver_state is only used to store a pointer to a larger context struct allocated by the driver. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: validate crypto_info in a separate helper	Sabrina Dubroca
	Simplify do_tls_setsockopt_conf a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: remove tls_context argument from tls_set_device_offload	Sabrina Dubroca
	It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. While at it, fix up the reverse xmas tree ordering. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: remove tls_context argument from tls_set_sw_offload	Sabrina Dubroca
	It's not really needed since we end up refetching it as tls_ctx. We can also remove the NULL check, since we have already dereferenced ctx in do_tls_setsockopt_conf. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: add a helper to allocate/initialize offload_ctx_tx	Sabrina Dubroca
	Simplify tls_set_device_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: also use init_prot_info in tls_set_device_offload	Sabrina Dubroca
	Most values are shared. Nonce size turns out to be equal to IV size for all offloadable ciphers. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: move tls_prot_info initialization out of tls_set_sw_offload	Sabrina Dubroca
	Simplify tls_set_sw_offload, and allow reuse for the tls_device code. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: extract context alloc/initialization out of tls_set_sw_offload	Sabrina Dubroca
	Simplify tls_set_sw_offload a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: store iv directly within cipher_context	Sabrina Dubroca
	TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE is 20B, we don't get much benefit in cipher_context's size and can simplify the init code a bit. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: rename MAX_IV_SIZE to TLS_MAX_IV_SIZE	Sabrina Dubroca
	It's defined in include/net/tls.h, avoid using an overly generic name. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: store rec_seq directly within cipher_context	Sabrina Dubroca
	TLS_MAX_REC_SEQ_SIZE is 8B, we don't get anything by using kmalloc. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: drop unnecessary cipher_type checks in tls offload	Sabrina Dubroca
	We should never reach tls_device_reencrypt, tls_enc_record, or tls_enc_skb with a cipher_type that can't be offloaded. Replace those checks with a DEBUG_NET_WARN_ON_ONCE, and use cipher_desc instead of hard-coding offloadable cipher types. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	tls: get salt using crypto_info_salt in tls_enc_skb	Sabrina Dubroca
	I skipped this conversion in my previous series. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: stmmac: fix typo in comment	Johannes Zink
	This is just a trivial fix for a typo in a comment, no functional changes. Signed-off-by: Johannes Zink <j.zink@pengutronix.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	selftests: netdevsim: use suitable existing dummy file for flash test	Jiri Pirko
	The file name used in flash test was "dummy" because at the time test was written, drivers were responsible for file request and as netdevsim didn't do that, name was unused. However, the file load request is now done in devlink code and therefore the file has to exist. Use first random file from /lib/firmware for this purpose. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	xen-netback: add software timestamp capabilities	Luca Fancellu
	Add software timestamp capabilities to the xen-netback driver by advertising it on the struct ethtool_ops and calling skb_tx_timestamp before passing the buffer to the queue. Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	ibmvnic: replace deprecated strncpy with strscpy	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. NUL-padding is not required as the buffer is already memset to 0: \| memset(adapter->fw_version, 0, 32); Note that another usage of strscpy exists on the same buffer: \| strscpy((char *)adapter->fw_version, "N/A", sizeof(adapter->fw_version)); Considering the above, a suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer without unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: fec: replace deprecated strncpy with ethtool_sprintf	Justin Stitt
	`strncpy` is deprecated for use on NUL-terminated destination strings [1] and as such we should prefer more robust and less ambiguous string interfaces. ethtool_sprintf() is designed specifically for get_strings() usage. Let's replace strncpy in favor of this more robust and easier to understand interface. Also, while we're here, let's change memcpy() over to ethtool_sprintf() for consistency. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: mdio: xgene: Use device_get_match_data()	Rob Herring
	Use preferred device_get_match_data() instead of of_match_device() and acpi_match_device() to get the driver match data. With this, adjust the includes to explicitly include the correct headers. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: wwan: t7xx: Add __counted_by for struct t7xx_fsm_event and use ↵	Gustavo A. R. Silva
	struct_size() Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: ethernet: wiznet: Use spi_get_device_match_data()	Rob Herring
	Use preferred spi_get_device_match_data() instead of of_match_device() and spi_get_device_id() to get the driver match data. With this, adjust the includes to explicitly include the correct headers. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: ethernet: Use device_get_match_data()	Rob Herring
	Use preferred device_get_match_data() instead of of_match_device() to get the driver match data. With this, adjust the includes to explicitly include the correct headers. Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	net: stmmac: dwmac-stm32: refactor clock config	Ben Wolsieffer
	Currently, clock configuration is spread throughout the driver and partially duplicated for the STM32MP1 and STM32 MCU variants. This makes it difficult to keep track of which clocks need to be enabled or disabled in various scenarios. This patch adds symmetric stm32_dwmac_clk_enable/disable() functions that handle all clock configuration, including quirks required while suspending or resuming. syscfg_clk and clk_eth_ck are not present on STM32 MCUs, but it is fine to try to configure them anyway since NULL clocks are ignored. Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	Merge branch 'vxlan-fdb-flushing'	David S. Miller
	Amit Cohen says: ==================== Extend VXLAN driver to support FDB flushing The merge commit 92716869375b ("Merge branch 'br-flush-filtering'") added support for FDB flushing in bridge driver. Extend VXLAN driver to support FDB flushing also. Add support for filtering by fields which are relevant for VXLAN FDBs: * Source VNI * Nexthop ID * 'router' flag * Destination VNI * Destination Port * Destination IP Without this set, flush for VXLAN device fails: $ bridge fdb flush dev vx10 RTNETLINK answers: Operation not supported With this set, such flush works with the relevant arguments, for example: $ bridge fdb flush dev vx10 vni 5000 dst 193.2.2.1 < flush all vx10 entries with VNI 5000 and destination IP 193.2.2.1> Some preparations are required, handle them before adding flushing support in VXLAN driver. See more details in commit messages. Patch set overview: Patch #1 prepares flush policy to be used by VXLAN driver Patches #2-#3 are preparations in VXLAN driver Patch #4 adds an initial support for flushing in VXLAN driver Patches #5-#9 add support for filtering by several attributes Patch #10 adds a test for FDB flush with VXLAN Patch #11 extends the test to check FDB flush with bridge ==================== Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	selftests: fdb_flush: Add test cases for FDB flush with bridge device	Amit Cohen
	Extend the test to check flushing with bridge device, test flush by device and by VID. Add test case for flushing with "self" and "master" and attributes that are supported only in one driver, this is unrecommended configuration, check it to verify that user gets an error. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	selftests: Add test cases for FDB flush with VXLAN device	Amit Cohen
	Test all the supported arguments for FDB flush. The test checks configuration, not traffic. Note that the flag 'offloaded' is not checked as it is not relevant when there is no hardware. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	vxlan: vxlan_core: Support FDB flushing by destination IP	Amit Cohen
	Add support for flush VXLAN FDB entries by destination IP. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination IP is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination IP', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.3 self permanent 00:00:00:00:00:00 dst 192.1.1.1 self permanent 00:00:00:00:00:00 dst 192.1.1.2 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 1000 self permanent When user flush by destination IP x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 dst 192.1.1.1 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.3 self permanent 00:00:00:00:00:00 dst 192.1.1.2 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	vxlan: vxlan_core: Support FDB flushing by destination port	Amit Cohen
	Add support for flush VXLAN FDB entries by destination port. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination port is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination port', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 port 1111 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent When user flush by port x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 port 1111 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 port 2222 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	vxlan: vxlan_core: Support FDB flushing by destination VNI	Amit Cohen
	Add support for flush VXLAN FDB entries by destination VNI. FDB entry is stored as {MAC, SRC_VNI} + remote. The destination VNI is an attribute of the remote. For multicast entries, the VXLAN driver stores a linked list of remotes for a given key. In user space, each remote is represented as a separate entry, so when flush is sent with filter of 'destination VNI', flush only the match remotes. In case that there are no additional remotes, destroy the entry. For example, the following are stored as one entry with several remotes: $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 2000 self permanent 00:00:00:00:00:00 dst 192.1.1.2 vni 2000 self permanent When user flush by VNI x, only the relevant remotes will be flushed: $ bridge fdb flush dev vx10 vni 2000 $ bridge fdb show dev vx10 00:00:00:00:00:00 dst 192.1.1.1 vni 3000 self permanent 00:00:00:00:00:00 dst 192.1.1.1 vni 4000 self permanent Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-13	vxlan: vxlan_core: Support FDB flushing by nexthop ID	Amit Cohen
	Add support for flush VXLAN FDB entries by nexthop ID. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>