Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf and netfilter.
A lot of networking people were at a conference last week, busy
catching COVID, so relatively short PR.
Current release - regressions:
- tcp: process the 3rd ACK with sk_socket for TFO and MPTCP
Current release - new code bugs:
- l2tp: protect session IDR and tunnel session list with one lock,
make sure the state is coherent to avoid a warning
- eth: bnxt_en: update xdp_rxq_info in queue restart logic
- eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field
Previous releases - regressions:
- xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
the field reuses previously un-validated pad
Previous releases - always broken:
- tap/tun: drop short frames to prevent crashes later in the stack
- eth: ice: add a per-VF limit on number of FDIR filters
- af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"
* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
tun: add missing verification for short frame
tap: add missing verification for short frame
mISDN: Fix a use after free in hfcmulti_tx()
gve: Fix an edge case for TSO skb validity check
bnxt_en: update xdp_rxq_info in queue restart logic
tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
bpf: Fix a segment issue when downgrading gso_size
net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
MAINTAINERS: make Breno the netconsole maintainer
MAINTAINERS: Update bonding entry
net: nexthop: Initialize all fields in dumped nexthops
net: stmmac: Correct byte order of perfect_match
selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
tipc: Return non-zero value from tipc_udp_addr2str() on error
netfilter: nft_set_pipapo_avx2: disable softinterrupts
ice: Fix recipe read procedure
ice: Add a per-VF limit on number of FDIR filters
net: bonding: correctly annotate RCU in bond_should_notify_peers()
...
|
|
learning limit
If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like
./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected
Fixes: 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Johannes Nixdorf <jnixdorf-oss@avm.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- In the series "mm: Avoid possible overflows in dirty throttling" Jan
Kara addresses a couple of issues in the writeback throttling code.
These fixes are also targetted at -stable kernels.
- Ryusuke Konishi's series "nilfs2: fix potential issues related to
reserved inodes" does that. This should actually be in the
mm-nonmm-stable tree, along with the many other nilfs2 patches. My
bad.
- More folio conversions from Kefeng Wang in the series "mm: convert to
folio_alloc_mpol()"
- Kemeng Shi has sent some cleanups to the writeback code in the series
"Add helper functions to remove repeated code and improve readability
of cgroup writeback"
- Kairui Song has made the swap code a little smaller and a little
faster in the series "mm/swap: clean up and optimize swap cache
index".
- In the series "mm/memory: cleanly support zeropage in
vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
Hildenbrand has reworked the rather sketchy handling of the use of
the zeropage in MAP_SHARED mappings. I don't see any runtime effects
here - more a cleanup/understandability/maintainablity thing.
- Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
of higher addresses, for aarch64. The (poorly named) series is
"Restructure va_high_addr_switch".
- The core TLB handling code gets some cleanups and possible slight
optimizations in Bang Li's series "Add update_mmu_tlb_range() to
simplify code".
- Jane Chu has improved the handling of our
fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
the series "Enhance soft hwpoison handling and injection".
- Jeff Johnson has sent a billion patches everywhere to add
MODULE_DESCRIPTION() to everything. Some landed in this pull.
- In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
has simplified migration's use of hardware-offload memory copying.
- Yosry Ahmed performs more folio API conversions in his series "mm:
zswap: trivial folio conversions".
- In the series "large folios swap-in: handle refault cases first",
Chuanhua Han inches us forward in the handling of large pages in the
swap code. This is a cleanup and optimization, working toward the end
objective of full support of large folio swapin/out.
- In the series "mm,swap: cleanup VMA based swap readahead window
calculation", Huang Ying has contributed some cleanups and a possible
fixlet to his VMA based swap readahead code.
- In the series "add mTHP support for anonymous shmem" Baolin Wang has
taught anonymous shmem mappings to use multisize THP. By default this
is a no-op - users must opt in vis sysfs controls. Dramatic
improvements in pagefault latency are realized.
- David Hildenbrand has some cleanups to our remaining use of
page_mapcount() in the series "fs/proc: move page_mapcount() to
fs/proc/internal.h".
- David also has some highmem accounting cleanups in the series
"mm/highmem: don't track highmem pages manually".
- Build-time fixes and cleanups from John Hubbard in the series
"cleanups, fixes, and progress towards avoiding "make headers"".
- Cleanups and consolidation of the core pagemap handling from Barry
Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
and utilize them".
- Lance Yang's series "Reclaim lazyfree THP without splitting" has
reduced the latency of the reclaim of pmd-mapped THPs under fairly
common circumstances. A 10x speedup is seen in a microbenchmark.
It does this by punting to aother CPU but I guess that's a win unless
all CPUs are pegged.
- hugetlb_cgroup cleanups from Xiu Jianfeng in the series
"mm/hugetlb_cgroup: rework on cftypes".
- Miaohe Lin's series "Some cleanups for memory-failure" does just that
thing.
- Someone other than SeongJae has developed a DAMON feature in Honggyu
Kim's series "DAMON based tiered memory management for CXL memory".
This adds DAMON features which may be used to help determine the
efficiency of our placement of CXL/PCIe attached DRAM.
- DAMON user API centralization and simplificatio work in SeongJae
Park's series "mm/damon: introduce DAMON parameters online commit
function".
- In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
David Hildenbrand does some maintenance work on zsmalloc - partially
modernizing its use of pageframe fields.
- Kefeng Wang provides more folio conversions in the series "mm: remove
page_maybe_dma_pinned() and page_mkclean()".
- More cleanup from David Hildenbrand, this time in the series
"mm/memory_hotplug: use PageOffline() instead of PageReserved() for
!ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
pages" and permits the removal of some virtio-mem hacks.
- Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
__folio_add_anon_rmap()" is a cleanup to the anon folio handling in
preparation for mTHP (multisize THP) swapin.
- Kefeng Wang's series "mm: improve clear and copy user folio"
implements more folio conversions, this time in the area of large
folio userspace copying.
- The series "Docs/mm/damon/maintaier-profile: document a mailing tool
and community meetup series" tells people how to get better involved
with other DAMON developers. From SeongJae Park.
- A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
that.
- David Hildenbrand sends along more cleanups, this time against the
migration code. The series is "mm/migrate: move NUMA hinting fault
folio isolation + checks under PTL".
- Jan Kara has found quite a lot of strangenesses and minor errors in
the readahead code. He addresses this in the series "mm: Fix various
readahead quirks".
- SeongJae Park's series "selftests/damon: test DAMOS tried regions and
{min,max}_nr_regions" adds features and addresses errors in DAMON's
self testing code.
- Gavin Shan has found a userspace-triggerable WARN in the pagecache
code. The series "mm/filemap: Limit page cache size to that supported
by xarray" addresses this. The series is marked cc:stable.
- Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
and cleanup" cleans up and slightly optimizes KSM.
- Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
code motion. The series (which also makes the memcg-v1 code
Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
under config option" and "mm: memcg: put cgroup v1-specific memcg
data under CONFIG_MEMCG_V1"
- Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
adds an additional feature to this cgroup-v2 control file.
- The series "Userspace controls soft-offline pages" from Jiaqi Yan
permits userspace to stop the kernel's automatic treatment of
excessive correctable memory errors. In order to permit userspace to
monitor and handle this situation.
- Kefeng Wang's series "mm: migrate: support poison recover from
migrate folio" teaches the kernel to appropriately handle migration
from poisoned source folios rather than simply panicing.
- SeongJae Park's series "Docs/damon: minor fixups and improvements"
does those things.
- In the series "mm/zsmalloc: change back to per-size_class lock"
Chengming Zhou improves zsmalloc's scalability and memory
utilization.
- Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
pinning memfd folios" makes the GUP code use FOLL_PIN rather than
bare refcount increments. So these paes can first be moved aside if
they reside in the movable zone or a CMA block.
- Andrii Nakryiko has added a binary ioctl()-based API to
/proc/pid/maps for much faster reading of vma information. The series
is "query VMAs from /proc/<pid>/maps".
- In the series "mm: introduce per-order mTHP split counters" Lance
Yang improves the kernel's presentation of developer information
related to multisize THP splitting.
- Michael Ellerman has developed the series "Reimplement huge pages
without hugepd on powerpc (8xx, e500, book3s/64)". This permits
userspace to use all available huge page sizes.
- In the series "revert unconditional slab and page allocator fault
injection calls" Vlastimil Babka removes a performance-affecting and
not very useful feature from slab fault injection.
* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
mm/mglru: fix ineffective protection calculation
mm/zswap: fix a white space issue
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix possible recursive locking detected warning
mm/gup: clear the LRU flag of a page before adding to LRU batch
mm/numa_balancing: teach mpol_to_str about the balancing mode
mm: memcg1: convert charge move flags to unsigned long long
alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
lib: reuse page_ext_data() to obtain codetag_ref
lib: add missing newline character in the warning message
mm/mglru: fix overshooting shrinker memory
mm/mglru: fix div-by-zero in vmpressure_calc_level()
mm/kmemleak: replace strncpy() with strscpy()
mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
mm: ignore data-race in __swap_writepage
hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
mm: shmem: rename mTHP shmem counters
mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
mm/migrate: putback split folios when numa hint migration fails
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Call nf_expect_get_id() to delete expectation by ID. By trial and
error it is possible to leak the LSB of the expectation address on
x86_64. This bug is a leftover when converting the existing code
to use nf_expect_get_id().
2) Incorrect initialization in pipapo set backend leads to packet
mismatches. From Florian Westphal.
3) Extend netfilter's selftests to cover for the pipapo set backend,
also from Florian.
4) Fix sparse warning in IPVS when adding service, from Chen Hanxiao.
netfilter pull request 24-07-17
* tag 'nf-24-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
ipvs: properly dereference pe in ip_vs_add_service
selftests: netfilter: add test case for recent mismatch bug
netfilter: nf_set_pipapo: fix initial map fill
netfilter: ctnetlink: use helper function to calculate expect ID
====================
Link: https://patch.msgid.link/20240717215214.225394-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1
Fix by instead returning the DSCP field from the FIB result structure
which was populated during the route lookup.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1
Extend the existing selftests to not only verify that the correct route
is returned, but that it is also returned with correct "tos" value (or
without it).
Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Without 'netfilter: nf_set_pipapo: fix initial map fill' this fails:
TEST: reported issues
Add two elements, flush, re-add 1s [ OK ]
net,mac with reload 1s [ OK ]
net,port,proto 1s [FAIL]
post-add: should have returned 10.5.8.0/24 . 51-60 . 6-17 but got table inet filter {
set test {
type ipv4_addr . inet_service . inet_proto
flags interval,timeout
elements = { 10.5.7.0/24 . 51-60 . 6-17 }
}
}
The other sets defined in the selftest do not trigger this bug, it only
occurs if the first field group bitsize is smaller than the largest
group bitsize.
For each added element, check 'get' works and actually returns the
requested range.
After map has been filled, check all added ranges can still be
retrieved.
For each deleted element, check that 'get' fails.
Based on a reproducer script from Yi Chen.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Merge in late fixes to prepare for the 6.11 net-next PR.
Conflicts:
93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref")
30d7b6727724 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The goal is to check that the source address selected by the kernel is
routable when a leaking route is used. ICMP, TCP and UDP connections are
tested.
The symmetric topology is enough for this test.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240710081521.3809742-5-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Lately, an additional locking was added by commit c0a40097f0bc
("drivers: core: synchronize really_probe() and dev_uevent()"). The
locking protects dev_uevent() calling. This function is used to send
messages from the kernel to user space. Uevent messages notify user space
about changes in device states, such as when a device is added, removed,
or changed. These messages are used by udev (or other similar user-space
tools) to apply device-specific rules.
After reloading devlink instance, udev events should be processed. This
locking causes a short delay of udev events handling.
One example for useful udev rule is renaming ports. 'forwading.config'
can be configured to use names after udev rules are applied. Some tests run
devlink_reload() and immediately use the updated names. This worked before
the above mentioned commit was pushed, but now the delay of uevent messages
causes that devlink_reload() returns before udev events are handled and
tests fail.
Adjust devlink_reload() to not assume that udev events are already
processed when devlink reload is done, instead, wait for udev events to
ensure they are processed before returning from the function.
Without this patch:
TESTS='rif_mac_profile' ./resource_scale.sh
TEST: 'rif_mac_profile' 4 [ OK ]
sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory
sysctl: cannot stat /proc/sys/net/ipv6/conf/swp1/disable_ipv6: No such file or directory
sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory
sysctl: cannot stat /proc/sys/net/ipv6/conf/swp2/disable_ipv6: No such file or directory
Cannot find device "swp1"
Cannot find device "swp2"
TEST: setup_wait_dev (: Interface swp1 does not come up.) [FAIL]
With this patch:
$ TESTS='rif_mac_profile' ./resource_scale.sh
TEST: 'rif_mac_profile' 4 [ OK ]
TEST: 'rif_mac_profile' overflow 5 [ OK ]
This is relevant not only for this test.
Fixes: bc7cbb1e9f4c ("selftests: forwarding: Add devlink_lib.sh")
Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/89367666e04b38a8993027f1526801ca327ab96a.1720709333.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It looks like we missed these two errors recently:
- SC2068: Double quote array expansions to avoid re-splitting elements.
- SC2145: Argument mixes string and array. Use * or separate argument.
Two simple fixes, it is not supposed to change the behaviour as the
variable names should not have any spaces in their names. Still, better
to fix them to easily spot new issues.
Fixes: f265d3119a29 ("selftests: mptcp: lib: use setup/cleanup_ns helpers")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240712-upstream-net-next-20240712-selftests-mptcp-fix-shellcheck-v1-1-1cb7180db40a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcp_ao/self-connect.c checked the following SNMP stats before/after
connect() to confirm that the test exercises the simultaneous connect()
path.
* TCPChallengeACK
* TCPSYNChallenge
But the stats should not be counted for self-connect in the first place,
and the assumption is no longer true.
Let's remove the check.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Link: https://patch.msgid.link/20240710171246.87533-3-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are a couple of places where the test script "sleep"s to wait for
some external condition to be met.
This is error prone, specially in slow systems (identified in CI by
"KSFT_MACHINE_SLOW=yes").
To fix this, add a "ovs_wait" function that tries to execute a command
a few times until it succeeds. The timeout used is set to 5s for
"normal" systems and doubled if a slow CI machine is detected.
This should make the following work:
$ vng --build \
--config tools/testing/selftests/net/config \
--config kernel/configs/debug.config
$ vng --run . --user root -- "make -C tools/testing/selftests/ \
KSFT_MACHINE_SLOW=yes TARGETS=net/openvswitch run_tests"
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Reviewed-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://patch.msgid.link/20240710090500.1655212-1-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Centralize the _GNU_SOURCE definition to CFLAGS in lib.mk. Remove
redundant defines from Makefiles that import lib.mk. Convert any usage of
"#define _GNU_SOURCE 1" to "#define _GNU_SOURCE".
This uses the form "-D_GNU_SOURCE=", which is equivalent to
"#define _GNU_SOURCE".
Otherwise using "-D_GNU_SOURCE" is equivalent to "-D_GNU_SOURCE=1" and
"#define _GNU_SOURCE 1", which is less commonly seen in source code and
would require many changes in selftests to avoid redefinition warnings.
Link: https://lkml.kernel.org/r/20240625223454.1586259-2-edliaw@google.com
Signed-off-by: Edward Liaw <edliaw@google.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Kees Cook <kees@kernel.org>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The ageing time used by the test is too short for debug kernels and
results in entries being aged out prematurely [1].
Fix by increasing the ageing time.
The same change was done for the VLAN-aware version of the test in
commit dfbab74044be ("selftests: forwarding: Make vxlan-bridge-1q pass
on debug kernels").
[1]
# ./vxlan_bridge_1d.sh
[...]
# TEST: VXLAN: flood before learning [ OK ]
# TEST: VXLAN: show learned FDB entry [ OK ]
# TEST: VXLAN: learned FDB entry [FAIL]
# veth3: Expected to capture 0 packets, got 4.
# RTNETLINK answers: No such file or directory
# TEST: VXLAN: deletion of learned FDB entry [ OK ]
# TEST: VXLAN: Ageing of learned FDB entry [FAIL]
# veth3: Expected to capture 0 packets, got 2.
[...]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240707095458.2870260-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It's very useful to be able to interrupt the tests during development.
Detect KeyboardInterrupt, run the cleanups and exit.
Link: https://patch.msgid.link/20240705015222.675840-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add a test to verify sampling packets via psample works.
In order to do that, create a subcommand in ovs-dpctl.py to listen to
on the psample multicast group and print samples.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Tested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-11-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The trunc action was supported decode-able but not parse-able. Add
support for parsing the action string.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-10-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The userspace action lacks parsing support plus it contains a bug in the
name of one of its attributes.
This patch makes userspace action work.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-9-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add sample and psample action support to ovs-dpctl.py.
Refactor common attribute parsing logic into an external function.
Reviewed-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
Link: https://patch.msgid.link/20240704085710.353845-8-amorenoz@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
drivers/net/phy/aquantia/aquantia.h
219343755eae ("net: phy: aquantia: add missing include guards")
61578f679378 ("net: phy: aquantia: add support for PHY LEDs")
drivers/net/ethernet/wangxun/libwx/wx_hw.c
bd07a9817846 ("net: txgbe: remove separate irq request for MSI and INTx")
b501d261a5b3 ("net: txgbe: add FDIR ATR support")
https://lore.kernel.org/all/20240703112936.483c1975@canb.auug.org.au/
include/linux/mlx5/mlx5_ifc.h
048a403648fc ("net/mlx5: IFC updates for changing max EQs")
99be56171fa9 ("net/mlx5e: SHAMPO, Re-enable HW-GRO")
https://lore.kernel.org/all/20240701133951.6926b2e3@canb.auug.org.au/
Adjacent changes:
drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c
4130c67cd123 ("wifi: iwlwifi: mvm: check vif for NULL/ERR_PTR before dereference")
3f3126515fbe ("wifi: iwlwifi: mvm: add mvm-specific guard")
include/net/mac80211.h
816c6bec09ed ("wifi: mac80211: fix BSS_CHANGED_UNSOL_BCAST_PROBE_RESP")
5a009b42e041 ("wifi: mac80211: track changes in AP's TPE")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We find that when lock debugging is on, notifications may not come in
order. Thus, we have order checking outputs managed by cfg_verbose, to
avoid too many outputs in this case.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-3-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, as the introduction of commit
dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale"), the sender is
always writable and does not get any chance to run recv notifications.
The selftest always exits with OUT_OF_MEMORY because the memory used by
opt_skb exceeds the net.core.optmem_max. Meanwhile, it could be set to a
different value to trigger OOM on older kernels too.
Thus, we introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs.
Fixes: 07b65c5b31ce ("test: add msg_zerocopy test")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240701225349.3395580-2-zijianzhang@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzkaller reported a KMSAN splat in __unix_walk_scc() while backtracking
edge_stack after finalising SCC.
Let's add a test case exercising the path.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Link: https://patch.msgid.link/20240702160428.10153-2-syoshida@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The openvswitch selftest is difficult to debug for anyone that isn't
directly familiar with the openvswitch module and the specifics of the
test cases. Many times when something fails, the debug log will be
sparsely populated and it takes some time to understand where a failure
occured.
Increase the amount of details logged to the debug log by trapping all
'info' logs, and all 'ovs_sbx' commands.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Previously, the openvswitch.sh test suites would not attempt to autoload
the openvswitch module. The idea was that a user who is manually running
tests might not even have the OVS module loaded or configured for their
own development. However, if the kernel module is configured, and the
module can be autoloaded then we should just attempt to load it and run
the tests. This is especially true in the CI environments, where the CI
tests should be able to rely on auto loading to get the test suite running.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We found that since some tests rely on the TCP SYN timeouts to cause flow
misses, the default test suite timeout of 45 seconds is quick to be
exceeded. Bump the timeout to 15 minutes.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240702132830.213384-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Support building the C YNL userspace library into one big static file.
We can then link selftests against it for easy to use C netlink
interface.
Signed-off-by: Mina Almasry <almasrymina@google.com>
Link: https://patch.msgid.link/20240628003253.1694510-14-almasrymina@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next into main
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
Patch #1 to #11 to shrink memory consumption for transaction objects:
struct nft_trans_chain { /* size: 120 (-32), cachelines: 2, members: 10 */
struct nft_trans_elem { /* size: 72 (-40), cachelines: 2, members: 4 */
struct nft_trans_flowtable { /* size: 80 (-48), cachelines: 2, members: 5 */
struct nft_trans_obj { /* size: 72 (-40), cachelines: 2, members: 4 */
struct nft_trans_rule { /* size: 80 (-32), cachelines: 2, members: 6 */
struct nft_trans_set { /* size: 96 (-24), cachelines: 2, members: 8 */
struct nft_trans_table { /* size: 56 (-40), cachelines: 1, members: 2 */
struct nft_trans_elem can now be allocated from kmalloc-96 instead of
kmalloc-128 slab.
Series from Florian Westphal. For the record, I have mangled patch #1
to add nft_trans_container_*() and use if for every transaction object.
I have also added BUILD_BUG_ON to ensure struct nft_trans always comes
at the beginning of the container transaction object. And few minor
cleanups, any new bugs are of my own.
Patch #12 simplify check for SCTP GSO in IPVS, from Ismael Luceno.
Patch #13 nf_conncount key length remains in the u32 bound, from Yunjian Wang.
Patch #14 removes unnecessary check for CTA_TIMEOUT_L3PROTO when setting
default conntrack timeouts via nfnetlink_cttimeout API, from
Lin Ma.
Patch #15 updates NFT_SECMARK_CTX_MAXLEN to 4096, SELinux could use
larger secctx names than the existing 256 bytes length.
Patch #16 adds a selftest to exercise nfnetlink_queue listeners leaving
nfnetlink_queue, from Florian Westphal.
Patch #17 increases hitcount from 255 to 65535 in xt_recent, from Phil Sutter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This implements what I was describing in [1]. When writing a test
author can schedule cleanup / undo actions right after the creation
completes, eg:
cmd("touch /tmp/file")
defer(cmd, "rm /tmp/file")
defer() takes the function name as first argument, and the rest are
arguments for that function. defer()red functions are called in
inverse order after test exits. It's also possible to capture them
and execute earlier (in which case they get automatically de-queued).
undo = defer(cmd, "rm /tmp/file")
# ... some unsafe code ...
undo.exec()
As a nice safety all exceptions from defer()ed calls are captured,
printed, and ignored (they do make the test fail, however).
This addresses the common problem of exceptions in cleanup paths
often being unhandled, leading to potential leaks.
There is a global action queue, flushed by ksft_run(). We could support
function level defers too, I guess, but there's no immediate need..
Link: https://lore.kernel.org/all/877cedb2ki.fsf@nvidia.com/ # [1]
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Exception handlers print the result and use continue
to skip the non-exception result printing. This makes
inserting common post-test code hard. Refactor to
avoid the continues and have only one ktap_result() call.
Reviewed-by: Petr Machata <petrm@nvidia.com>
Link: https://patch.msgid.link/20240627185502.3069139-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.
Test against a dummy device because it is simpler to set up then TUN/TAP.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20240626-linux-udpgso-v2-2-422dfcbd6b48@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If userspace program exits while the queue its subscribed to has packets
those need to be discarded.
commit dc21c6cc3d69 ("netfilter: nfnetlink_queue: acquire rcu_read_lock()
in instance_destroy_rcu()") fixed a (harmless) rcu splat that could be
triggered in this case.
Add a test case to cover this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Nothing calls these.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These functions are not used anymore.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The selftest does not use functions from mirror_gre_lib, ditch the import.
It does not use arping either, so drop the require_command as well.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mirroring tests are currently run in a skip_hw and optionally a skip_sw
mode. The former tests the SW datapath, the latter the HW datapath, if
available. In order to be able to test SW datapath on HW loopbacks, traps
are installed on ingress to get traffic from the HW datapath to the SW one.
This adds an unnecessary complexity when it would be much simpler to just
use a veth-based topology to test the SW datapath. Thus drop all the code
that supports this dual testing.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.
mirror_test() accommodated this noisiness by giving the counters an
allowance of several packets. But in the previous patch, where possible,
counter taps were changed to match only on an exact ICMP message. At least
in those cases, we can demand an exact number of packets to match.
Where the tap is installed on a connective netdevice, the exact matching is
not practical (though with u32, anything is possible). In those places,
there should still be some leeway -- and probably bigger than before,
because experience shows that these tests are very noisy.
To that end, change mirror_test() so that it can be either called with an
exact number to expect, or with an expression. Where leeway is needed,
adjust callers to pass a ">= 10" instead of mere 10.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The mirroring selftests work by sending ICMP traffic between two hosts.
Along the way, this traffic is mirrored to a gretap netdevice, and counter
taps are then installed strategically along the path of the mirrored
traffic to verify the mirroring took place.
The problem with this is that besides mirroring the primary traffic, any
other service traffic is mirrored as well. At the same time, because the
tests need to work in HW-offloaded scenarios, the ability of the device to
do arbitrary packet inspection should not be taken for granted. Most tests
therefore simply use matchall, one uses flower to match on IP address.
As a result, the selftests are noisy, because besides the primary ICMP
traffic, any amount of other service traffic is mirrored as well.
However, often the counter tap is installed at the remote end of the gretap
tunnel. Since this is a SW-datapath scenario anyway, we can make the filter
arbitrarily accurate.
Thus in this patch, add parameters forward_type and backward_type to
several mirroring test helpers, as some other helpers already have. Then
change do_test_span_dir_ips() to instead of installing one generic tap and
using it for test in both directions, install the tap for each direction
separately, matching on the ICMP type given by these parameters.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The test works by sending packets through a tunnel, whence they are
forwarded to a LAG. One of the LAG children is removed from the LAG prior
to the exercise, and the test then counts how many packets pass through the
other one. The issue with this is that it counts all packets, not just the
encapsulated ones.
So instead add a second gretap endpoint to receive the sent packets, and
check reception counters there.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The argument $dir has a fallback value of "ingress". Move the fallback from
the usage site to the argument definition block to make the fact clearer.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The argument is not used by these functions except to propagate it for
ultimately no purpose.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In some functions, argument-forwarding through "$@" without listing the
individual arguments explicitly is fundamental to the operation of a
function. E.g. xfail_on_veth() should be able to run various tests in the
fail-to-xfail regime, and usage of "$@" is appropriate as an abstraction
mechanism. For functions such as simple_if_init(), $@ is a handy way to
pass an array.
In other functions, it's merely a mechanism to save some typing, which
however ends up obscuring the real arguments and makes life hard for those
that end up reading the code.
This patch adds some of the implicit function arguments and correspondingly
expands $@'s. In several cases this will come in handy as following patches
adjust the parameter lists.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The pmtu testing will require that the OVS module is installed,
so do that.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-8-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current pmtu test infrastucture requires an installed copy of the
ovs-vswitchd userspace. This means that any automated or constrained
environments may not have the requisite tools to run the tests. However,
the pmtu tests don't require any special classifier processing. Indeed
they are only using the vswitchd in the most basic mode - as a NORMAL
switch.
However, the ovs-dpctl kernel utility can now program all the needed basic
flows to allow traffic to traverse the tunnels and provide support for at
least testing some basic pmtu scenarios. More complicated flow pipelines
can be added to the internal ovs test infrastructure, but that is work for
the future. For now, enable the most common cases - wide mega flows with
no other prerequisites.
Enhance the pmtu testing to try testing using the internal utility, first.
As a fallback, if the internal utility isn't running, then try with the
ovs-vswitchd userspace tools.
Additionally, make sure that when the pyroute2 package is not available
the ovs-dpctl utility will error out to properly signal an error has
occurred and skip using the internal utility.
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240625172245.233874-7-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The current iteration of IPv6 support requires explicit fields to be set
in addition to not properly support the actual IPv6 addresses properly.
With this change, make it so that the ipv6() bare option is usable to
create wildcarded flows to match broad swaths of ipv6 traffic.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-6-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This will be used when setting details about the tunnel to use as
transport. There is a difference between the ODP format between tunnel():
the 'key' flag is not actually a flag field, so we don't support it in the
same way that the vswitchd userspace supports displaying it.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240625172245.233874-5-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These will be used in upcoming commits to set specific attributes for
interacting with tunnels. Since set() will use the key parsing routine, we
also make sure to prepend it with an open paren, for the action parsing to
properly understand it.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-4-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Until recently, the ovs-dpctl utility was used with a limited actions set
and didn't need to have support for multiple similar actions. However,
when adding support for tunnels, it will be important to support multiple
set() actions in a single flow. When printing these actions, the existing
code will be unable to print all of the sets - it will only print the
first.
Refactor this code to be easier to read and support multiple actions of the
same type in an action list.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-3-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The OVS module can operate in conjunction with various types of
tunnel ports. These are created as either explicit tunnel vport
types, OR by creating a tunnel interface which acts as an anchor
for the lightweight tunnel support.
This patch adds the ability to add tunnel ports to an OVS
datapath for testing various scenarios with tunnel ports. With
this addition, the vswitch "plumbing" will at least be able to
push packets around using the tunnel vports. Future patches
will add support for setting required tunnel metadata for lwts
in the datapath. The end goal will be to push packets via these
tunnels, and will be used in an upcoming commit for testing the
path MTU.
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aaron Conole <aconole@redhat.com>
Link: https://patch.msgid.link/20240625172245.233874-2-aconole@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Cross-merge networking fixes after downstream PR.
No conflicts.
Adjacent changes:
e3f02f32a050 ("ionic: fix kernel panic due to multi-buffer handling")
d9c04209990b ("ionic: Mark error paths in the data path as unlikely")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|