Age | Commit message (Collapse) | Author |
|
Properties with variable number of items per each device are expected to
have widest constraints in top-level "properties:" block and further
customized (narrowed) in "if:then:". Add missing top-level constraints
for reg, clocks, clock-names, interrupts and interrupt-names.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240818172905.121829-3-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Properties with variable number of items per each device are expected to
have widest constraints in top-level "properties:" block and further
customized (narrowed) in "if:then:". Add missing top-level constraints
for clocks and clock-names.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240818172905.121829-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Each variable-length property like interrupts must have fixed
constraints on number of items for given variant in binding. The
clauses in "if:then:" block should define both limits: upper and lower.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20240818172905.121829-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When metadata_dst struct is allocated (using metadata_dst_alloc()), it
reserves room for options at the end of the struct.
Change the memcpy() to unsafe_memcpy() as it is guaranteed that enough
room (md_size bytes) was allocated and the field-spanning write is
intentional.
This resolves the following warning:
------------[ cut here ]------------
memcpy: detected field-spanning write (size 104) of single field "&new_md->u.tun_info" at include/net/dst_metadata.h:166 (size 96)
WARNING: CPU: 2 PID: 391470 at include/net/dst_metadata.h:166 tun_dst_unclone+0x114/0x138 [geneve]
Modules linked in: act_tunnel_key geneve ip6_udp_tunnel udp_tunnel act_vlan act_mirred act_skbedit cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress sbsa_gwdt ipmi_devintf ipmi_msghandler xfrm_interface xfrm6_tunnel tunnel6 tunnel4 xfrm_user xfrm_algo nvme_fabrics overlay optee openvswitch nsh nf_conncount ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm libiscsi ib_ipoib scsi_transport_iscsi ib_cm uio_pdrv_genirq uio mlxbf_pmc pwr_mlxbf mlxbf_bootctl bluefield_edac nft_chain_nat binfmt_misc xt_MASQUERADE nf_nat xt_tcpmss xt_NFLOG nfnetlink_log xt_recent xt_hashlimit xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_mark xt_comment ipt_REJECT nf_reject_ipv4 nft_compat nf_tables nfnetlink sch_fq_codel dm_multipath fuse efi_pstore ip_tables btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 nvme nvme_core mlx5_ib ib_uverbs ib_core ipv6 crc_ccitt mlx5_core crct10dif_ce mlxfw
psample i2c_mlxbf gpio_mlxbf2 mlxbf_gige mlxbf_tmfifo
CPU: 2 PID: 391470 Comm: handler6 Not tainted 6.10.0-rc1 #1
Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.5.0.12993 Dec 6 2023
pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : tun_dst_unclone+0x114/0x138 [geneve]
lr : tun_dst_unclone+0x114/0x138 [geneve]
sp : ffffffc0804533f0
x29: ffffffc0804533f0 x28: 000000000000024e x27: 0000000000000000
x26: ffffffdcfc0e8e40 x25: ffffff8086fa6600 x24: ffffff8096a0c000
x23: 0000000000000068 x22: 0000000000000008 x21: ffffff8092ad7000
x20: ffffff8081e17900 x19: ffffff8092ad7900 x18: 00000000fffffffd
x17: 0000000000000000 x16: ffffffdcfa018488 x15: 695f6e75742e753e
x14: 2d646d5f77656e26 x13: 6d5f77656e262220 x12: 646c65696620656c
x11: ffffffdcfbe33ae8 x10: ffffffdcfbe1baa8 x9 : ffffffdcfa0a4c10
x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 0000000000000001
x5 : ffffff83fdeeb010 x4 : 0000000000000000 x3 : 0000000000000027
x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffff80913f6780
Call trace:
tun_dst_unclone+0x114/0x138 [geneve]
geneve_xmit+0x214/0x10e0 [geneve]
dev_hard_start_xmit+0xc0/0x220
__dev_queue_xmit+0xa14/0xd38
dev_queue_xmit+0x14/0x28 [openvswitch]
ovs_vport_send+0x98/0x1c8 [openvswitch]
do_output+0x80/0x1a0 [openvswitch]
do_execute_actions+0x172c/0x1958 [openvswitch]
ovs_execute_actions+0x64/0x1a8 [openvswitch]
ovs_packet_cmd_execute+0x258/0x2d8 [openvswitch]
genl_family_rcv_msg_doit+0xc8/0x138
genl_rcv_msg+0x1ec/0x280
netlink_rcv_skb+0x64/0x150
genl_rcv+0x40/0x60
netlink_unicast+0x2e4/0x348
netlink_sendmsg+0x1b0/0x400
__sock_sendmsg+0x64/0xc0
____sys_sendmsg+0x284/0x308
___sys_sendmsg+0x88/0xf0
__sys_sendmsg+0x70/0xd8
__arm64_sys_sendmsg+0x2c/0x40
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x38/0x100
el0t_64_sync_handler+0xc0/0xc8
el0t_64_sync+0x1a4/0x1a8
---[ end trace 0000000000000000 ]---
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
Link: https://patch.msgid.link/20240818114351.3612692-1-gal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is a helper function ARRAY_SIZE() to help calculating the
u32 array size, and we don't need to do it mannually. So, let's
use ARRAY_SIZE() to calculate the array size, and improve the code
readability.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Jijie Shao<shaojijie@huawei.com>
Link: https://patch.msgid.link/20240818052518.45489-1-zhangzekun11@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Looking at timestamped output of netdev CI reveals that
most of the time in forwarding tests for custom route
hashing is spent on a single case, namely the test which
uses ping (mausezahn does not support flow labels).
On a non-debug kernel we spend 714 of 730 total test
runtime (97%) on this test case. While having flow label
support in a traffic gen tool / mausezahn would be best,
we can significantly speed up the loop by putting ip vrf exec
outside of the iteration.
In a test of 1000 pings using a normal loop takes 50 seconds
to finish. While using:
ip vrf exec $vrf sh -c "$loop-body"
takes 12 seconds (1/4 of the time).
Some of the slowness is likely due to our inefficient virtualization
setup, but even on my laptop running "ip link help" 16k times takes
25-30 seconds, so I think it's worth optimizing even for fastest
setups.
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20240817203659.712085-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The dpaa2_switch_add_bufs() function returns the number of bufs that it
was able to add. It returns BUFS_PER_CMD (7) for complete success or a
smaller number if there are not enough pages available. However, the
error checking is looking at the total number of bufs instead of the
number which were added on this iteration. Thus the error checking
only works correctly for the first iteration through the loop and
subsequent iterations are always counted as a success.
Fix this by checking only the bufs added in the current iteration.
Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mountain
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use always the same pf id in devlink port number. When doing
pass-through the PF to VM bus info func number can be any value.
Fixes: 2ae0aa4758b0 ("ice: Move devlink port to PF/VF struct")
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192,
truesize is calculated and stored in xdp_buff::frame_sz per each
processed Rx buffer. This means that frame_sz will contain the truesize
based on last received buffer, but commit 1dc1a7e7f410 ("ice:
Centrallize Rx buffer recycling") assumed this value will be constant
for each buffer, which breaks the page recycling scheme and mess up the
way we update the page::page_offset.
To fix this, let us work on constant truesize when PAGE_SIZE >= 8192
instead of basing this on size of a packet read from Rx descriptor. This
way we can simplify the code and avoid calculating truesize per each
received frame and on top of that when using
xdp_update_skb_shared_info(), current formula for truesize update will
be valid.
This means ice_rx_frame_truesize() can be removed altogether.
Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE
was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff()
in ice_vsi_cfg_rxq(). This should have been removed at the point where
xdp_buff struct started to be a member of ice_rx_ring and it was no
longer a stack based variable.
There are two fixes tags as my understanding is that the first one
exposed us to broken truesize and page_offset handling and then second
introduced broken skb_shared_info update in ice_{construct,build}_skb().
Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com>
Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/
Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling")
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers.
Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not
ICE_RXBUF_2048.
Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag")
Suggested-by: Luiz Capitulino <luizcap@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the
same as x86 currently, meaning reuse of a page should only take place
when no one else is busy with it.
Do two things independently of underlying PAGE_SIZE:
- store the page count under ice_rx_buf::pgcnt
- then act upon its value vs ice_rx_buf::pagecnt_bias when making the
decision regarding page reuse
Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dave Jiang:
"Check for RCH dport before accessing pci_host_bridge and a fix to
address a KASAN warning for the cxl regression test suite cxl-test"
* tag 'cxl-fixes-6.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Skip cxl_setup_parent_dport() for emulated dports
cxl/pci: Get AER capability address from RCRB only for RCH dport
|
|
Nikolay Aleksandrov says:
====================
bonding: fix xfrm offload bugs
I noticed these problems while reviewing a bond xfrm patch recently.
The fixes are straight-forward, please review carefully the last one
because it has side-effects. This set has passed bond's selftests
and my custom bond stress tests which crash without these fixes.
Note the first patch is not critical, but it simplifies the next fix.
====================
Link: https://patch.msgid.link/20240816114813.326645-1-razor@blackwall.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
If the active slave is cleared manually the xfrm state is not flushed.
This leads to xfrm add/del imbalance and adding the same state multiple
times. For example when the device cannot handle anymore states we get:
[ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
because it's filled with the same state after multiple active slave
clearings. This change also has a few nice side effects: user-space
gets a notification for the change, the old device gets its mac address
and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We shouldn't set real_dev to NULL because packets can be in transit and
xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume
real_dev is set.
Example trace:
kernel: BUG: unable to handle page fault for address: 0000000000001030
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: #PF: supervisor write access in kernel mode
kernel: #PF: error_code(0x0002) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0002 [#1] PREEMPT SMP
kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12
kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel:
kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60
kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00
kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014
kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000
kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000
kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: Call Trace:
kernel: <TASK>
kernel: ? __die+0x1f/0x60
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? page_fault_oops+0x142/0x4c0
kernel: ? do_user_addr_fault+0x65/0x670
kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: ? exc_page_fault+0x7b/0x180
kernel: ? asm_exc_page_fault+0x22/0x30
kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim]
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim]
kernel: bond0: (slave eni0np1): making interface the new active one
kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding]
kernel: xfrm_output+0x61/0x3b0
kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA
kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Fix the return type which should be bool.
Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
for_each_child_of_node can help to iterate through the device_node,
and we don't need to use while loop. No functional change with this
conversion.
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816015837.109627-1-zhangzekun11@huawei.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ido Schimmel says:
====================
Preparations for FIB rule DSCP selector
This patchset moves the masking of the upper DSCP bits in 'flowi4_tos'
to the core instead of relying on callers of the FIB lookup API to do
it.
This will allow us to start changing users of the API to initialize the
'flowi4_tos' field with all six bits of the DSCP field. In turn, this
will allow us to extend FIB rules with a new DSCP selector.
By masking the upper DSCP bits in the core we are able to maintain the
behavior of the TOS selector in FIB rules and routes to only match on
the lower DSCP bits.
While working on this I found two users of the API that do not mask the
upper DSCP bits before performing the lookup. The first is an ancient
netlink family that is unlikely to be used. It is adjusted in patch #1
to mask both the upper DSCP bits and the ECN bits before calling the
API.
The second user is a nftables module that differs in this regard from
its equivalent iptables module. It is adjusted in patch #2 to invoke the
API with the upper DSCP bits masked, like all other callers. The
relevant selftest passed, but in the unlikely case that regressions are
reported because of this change, we can restore the existing behavior
using a new flow information flag as discussed here [1].
The last patch moves the masking of the upper DSCP bits to the core,
making the first two patches redundant, but I wanted to post them
separately to call attention to the behavior change for these two users
of the FIB lookup API.
Future patchsets (around 3) will start unmasking the upper DSCP bits
throughout the networking stack before adding support for the new FIB
rule DSCP selector.
Changes from v1 [2]:
Patch #3: Include <linux/ip.h> in <linux/in_route.h> instead of
including it in net/ip_fib.h
[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
[2] https://lore.kernel.org/netdev/20240725131729.1729103-1-idosch@nvidia.com/
====================
Link: https://patch.msgid.link/20240814125224.972815-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.
What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).
Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:
# ip -4 rule add tos 0x1c table 100
# ip -4 rule add tos 0x3c table 100
Error: Invalid tos.
While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.
Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.
A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.
However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.
This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.
Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.
Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.
Include <linux/ip.h> in <linux/in_route.h> since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
<linux/in_route.h>.
No regressions in FIB tests:
# ./fib_tests.sh
[...]
Tests passed: 218
Tests failed: 0
And FIB rule tests:
# ./fib_rule_tests.sh
[...]
Tests passed: 116
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
As part of its functionality, the nftables FIB expression module
performs a FIB lookup, but unlike other users of the FIB lookup API, it
does so without masking the upper DSCP bits. In particular, this differs
from the equivalent iptables match ("rpfilter") that does mask the upper
DSCP bits before the FIB lookup.
Align the module to other users of the FIB lookup API and mask the upper
DSCP bits using IPTOS_RT_MASK before the lookup.
No regressions in nft_fib.sh:
# ./nft_fib.sh
PASS: fib expression did not cause unwanted packet drops
PASS: fib expression did drop packets for 1.1.1.1
PASS: fib expression did drop packets for 1c3::c01d
PASS: fib expression forward check with policy based routing
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The NETLINK_FIB_LOOKUP netlink family can be used to perform a FIB
lookup according to user provided parameters and communicate the result
back to user space.
However, unlike other users of the FIB lookup API, the upper DSCP bits
and the ECN bits of the DS field are not masked, which can result in the
wrong result being returned.
Solve this by masking the upper DSCP bits and the ECN bits using
IPTOS_RT_MASK.
The structure that communicates the request and the response is not
exported to user space, so it is unlikely that this netlink family is
actually in use [1].
[1] https://lore.kernel.org/netdev/ZpqpB8vJU%2FQ6LSqa@debian/
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
GRO code checks for matching layer 2 headers to see, if packet belongs
to the same flow and because ip6 tunnel set dev->hard_header_len
this check fails in cases, where it shouldn't. To fix this don't
set hard_header_len, but use needed_headroom like ipv4/ip_tunnel.c
does.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Link: https://patch.msgid.link/20240815151419.109864-1-tbogendoerfer@suse.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When only the last resource is invalid, tpmi_sst_dev_add() is returing
error even if there are other valid resources before. This function
should return error when there are no valid resources.
Here tpmi_sst_dev_add() is returning "ret" variable. But this "ret"
variable contains the failure status of last call to sst_main(), which
failed for the invalid resource. But there may be other valid resources
before the last entry.
To address this, do not update "ret" variable for sst_main() return
status.
If there are no valid resources, it is already checked for by !inst
below the loop and -ENODEV is returned.
Fixes: 9d1d36268f3d ("platform/x86: ISST: Support partitioned systems")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: stable@vger.kernel.org # 6.10+
Link: https://lore.kernel.org/r/20240816163626.415762-1-srinivas.pandruvada@linux.intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
revert commit 4c905f6740a3 ("netfilter: nf_tables: initialize registers in
nft_do_chain()").
Previous patch makes sure that loads from uninitialized registers are
detected from the control plane. in this case rule blob auto-zeroes
registers. Thus the explicit zeroing is not needed anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Reject rules where a load occurs from a register that has not seen a store
early in the same rule.
commit 4c905f6740a3 ("netfilter: nf_tables: initialize registers in
nft_do_chain()")
had to add a unconditional memset to the nftables register space to avoid
leaking stack information to userspace.
This memset shows up in benchmarks. After this change, this commit can
be reverted again.
Note that this breaks userspace compatibility, because theoretically
you can do
rule 1: reg2 := meta load iif, reg2 == 1 jump ...
rule 2: reg2 == 2 jump ... // read access with no store in this rule
... after this change this is rejected.
Neither nftables nor iptables-nft generate such rules, each rule is
always standalone.
This resuts in a small increase of nft_ctx structure by sizeof(long).
To cope with hypothetical rulesets like the example above one could emit
on-demand "reg[x] = 0" store when generating the datapath blob in
nf_tables_commit_chain_prepare().
A patch that does this is linked to below.
For now, lets disable this. In nf_tables, a rule is the smallest
unit that can be replaced from userspace, i.e. a hypothetical ruleset
that relies on earlier initialisations of registers can't be changed
at will as register usage would need to be coordinated.
Link: https://lore.kernel.org/netfilter-devel/20240627135330.17039-4-fw@strlen.de/
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Mechanical transformation, no logical changes intended.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
nft_counter_reset() resets the counter by subtracting the previously
retrieved value from the counter. This is a write operation on the
counter and as such it requires to be performed with a write sequence of
nft_counter_seq to serialize against its possible reader.
Update the packets/ bytes within write-sequence of nft_counter_seq.
Fixes: d84701ecbcd6a ("netfilter: nft_counter: rework atomic dump and reset")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The sequence counter nft_counter_seq is a per-CPU counter. There is no
lock associated with it. nft_counter_do_eval() is using the same counter
and disables BH which suggest that it can be invoked from a softirq.
This in turn means that nft_counter_offload_stats(), which disables only
preemption, can be interrupted by nft_counter_do_eval() leading to two
writer for one seqcount_t.
This can lead to loosing stats or reading statistics while they are
updated.
Disable BH during stats update in nft_counter_offload_stats() to ensure
one writer at a time.
Fixes: b72920f6e4a9d ("netfilter: nftables: counter hardware offload support")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Wen Gu says:
====================
net/smc: introduce ringbufs usage statistics
Currently, we have histograms that show the sizes of ringbufs that ever
used by SMC connections. However, they are always incremental and since
SMC allows the reuse of ringbufs, we cannot know the actual amount of
ringbufs being allocated or actively used.
So this patch set introduces statistics for the amount of ringbufs that
actually allocated by link group and actively used by connections of a
certain net namespace, so that we can react based on these memory usage
information, e.g. active fallback to TCP.
With appropriate adaptations of smc-tools, we can obtain these ringbufs
usage information:
$ smcr -d linkgroup
LG-ID : 00000500
LG-Role : SERV
LG-Type : ASYML
VLAN : 0
PNET-ID :
Version : 1
Conns : 0
Sndbuf : 12910592 B <-
RMB : 12910592 B <-
or
$ smcr -d stats
[...]
RX Stats
Data transmitted (Bytes) 869225943 (869.2M)
Total requests 18494479
Buffer usage (Bytes) 12910592 (12.31M) <-
[...]
TX Stats
Data transmitted (Bytes) 12760884405 (12.76G)
Total requests 36988338
Buffer usage (Bytes) 12910592 (12.31M) <-
[...]
[...]
Change log:
v3->v2
- use new helper nla_put_uint() instead of nla_put_u64_64bit().
v2->v1
https://lore.kernel.org/r/20240807075939.57882-1-guwen@linux.alibaba.com/
- remove inline keyword in .c files.
- use local variable in macros to avoid potential side effects.
v1
https://lore.kernel.org/r/20240805090551.80786-1-guwen@linux.alibaba.com/
====================
Link: https://patch.msgid.link/20240814130827.73321-1-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record
the sizes of ringbufs for all connections that have ever appeared in
the net namespace. They are incremental and we cannot know the actual
ringbufs usage from these. So here introduces statistics for current
ringbufs usage of existing smc connections in the net namespace into
smc_stats, it will be incremented when new connection uses a ringbuf
and decremented when the ringbuf is unused.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently we have the statistics on sndbuf/RMB sizes of all connections
that have ever been on the link group, namely smc_stats_memsize. However
these statistics are incremental and since the ringbufs of link group
are allowed to be reused, we cannot know the actual allocated buffers
through these. So here introduces the statistic on actual allocated
ringbufs of the link group, it will be incremented when a new ringbuf is
added into buf_list and decremented when it is deleted from buf_list.
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzkaller reported UAF in kcm_release(). [0]
The scenario is
1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
2. Thread A resumes building skb from kcm->seq_skb but is blocked
by sk_stream_wait_memory()
3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb
and puts the skb to the write queue
4. Thread A faces an error and finally frees skb that is already in the
write queue
5. kcm_release() does double-free the skb in the write queue
When a thread is building a MSG_MORE skb, another thread must not touch it.
Let's add a per-sk mutex and serialise kcm_sendmsg().
[0]:
BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline]
BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline]
BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Call trace:
dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x178/0x518 mm/kasan/report.c:488
kasan_report+0xd8/0x138 mm/kasan/report.c:601
__asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381
__skb_unlink include/linux/skbuff.h:2366 [inline]
__skb_dequeue include/linux/skbuff.h:2385 [inline]
__skb_queue_purge_reason include/linux/skbuff.h:3175 [inline]
__skb_queue_purge include/linux/skbuff.h:3181 [inline]
kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Allocated by task 6166:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626
unpoison_slab_object mm/kasan/common.c:314 [inline]
__kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903
__alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641
alloc_skb include/linux/skbuff.h:1296 [inline]
kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
sock_sendmsg+0x220/0x2c0 net/socket.c:768
splice_to_socket+0x7cc/0xd58 fs/splice.c:889
do_splice_from fs/splice.c:941 [inline]
direct_splice_actor+0xec/0x1d8 fs/splice.c:1164
splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108
do_splice_direct_actor fs/splice.c:1207 [inline]
do_splice_direct+0x1e4/0x304 fs/splice.c:1233
do_sendfile+0x460/0xb3c fs/read_write.c:1295
__do_sys_sendfile64 fs/read_write.c:1362 [inline]
__se_sys_sendfile64 fs/read_write.c:1348 [inline]
__arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348
__invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51
el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136
do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155
el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Freed by task 6167:
kasan_save_stack mm/kasan/common.c:47 [inline]
kasan_save_track+0x40/0x78 mm/kasan/common.c:68
kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640
poison_slab_object+0x124/0x18c mm/kasan/common.c:241
__kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257
kasan_slab_free include/linux/kasan.h:184 [inline]
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363
kfree_skbmem+0x10c/0x19c
__kfree_skb net/core/skbuff.c:1109 [inline]
kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144
kfree_skb include/linux/skbuff.h:1244 [inline]
kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685
__sock_release net/socket.c:659 [inline]
sock_close+0xa4/0x1e8 net/socket.c:1421
__fput+0x30c/0x738 fs/file_table.c:376
____fput+0x20/0x30 fs/file_table.c:404
task_work_run+0x230/0x2e0 kernel/task_work.c:180
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x618/0x1f64 kernel/exit.c:871
do_group_exit+0x194/0x22c kernel/exit.c:1020
get_signal+0x1500/0x15ec kernel/signal.c:2893
do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249
do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713
el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The buggy address belongs to the object at ffff0000ced0fc80
which belongs to the cache skbuff_head_cache of size 240
The buggy address is located 0 bytes inside of
freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
The buggy address belongs to the physical page:
page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f
flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff)
page_type: 0xffffffff()
raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004
raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
>ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc
ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module")
Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60
Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The check for '!to' is redundant here, since skb_can_coalesce() already
contains this check.
Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1723730983-22912-1-git-send-email-zhangchangzhong@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit a1ab24e5fc4a ("mptcp: consolidate sockopt synchronization")
removed the implementation but leave declaration.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816100404.879598-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These are never implenmented since commit b691b1116e82 ("net/mlx5: Implement
devlink port function cmds to control ipsec_packet").
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816101550.881844-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There is no caller and implementations in tree.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816101638.882072-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit f13697cc7a19 ("gve: Switch to config-aware queue allocation")
convert this function to gve_rx_alloc_rings_gqi().
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816101906.882743-1-yuehaibing@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In the MCTP route input test, we're routing one skb, then (when delivery
is expected) checking the resulting routed skb.
However, we're currently checking the original skb length, rather than
the routed skb. Check the routed skb instead; the original will have
been freed at this point.
Fixes: 8892c0490779 ("mctp: Add route input to socket tests")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kernel-janitors/4ad204f0-94cf-46c5-bdab-49592addf315@kili.mountain/
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240816-mctp-kunit-skb-fix-v1-1-3c367ac89c27@codeconstruct.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the netlink policy to validate IPv6 address length.
Destination address currently has policy for max len set,
and source has no policy validation. In both cases
the code does the real check. With correct policy
check the code can be removed.
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://patch.msgid.link/20240816212245.467745-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Michal Luczaj says:
====================
Series takes care of few bugs and missing features with the aim to improve
the test coverage of sockmap/sockhash.
Last patch is a create_pair() rewrite making use of
__attribute__((cleanup)) to handle socket fd lifetime.
---
Changes in v2:
- Rebase on bpf-next (Jakub)
- Use cleanup helpers from kernel's cleanup.h (Jakub)
- Fix subject of patch 3, rephrase patch 4, use correct prefix
- Link to v1: https://lore.kernel.org/r/20240724-sockmap-selftest-fixes-v1-0-46165d224712@rbox.co
Changes in v1:
- No declarations in function body (Jakub)
- Don't touch output arguments until function succeeds (Jakub)
- Link to v0: https://lore.kernel.org/netdev/027fdb41-ee11-4be0-a493-22f28a1abd7c@rbox.co/
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Rewrite function to have (unneeded) socket descriptors automatically
close()d when leaving the scope. Make sure the "ownership" of fds is
correctly passed via take_fd(); i.e. descriptor returned to caller will
remain valid.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-6-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Constants got switched reducing the test's coverage. Replace SOCK_DGRAM
with SOCK_STREAM in one of unix_inet_skb_redir_to_connected() tests.
Fixes: 51354f700d40 ("bpf, sockmap: Add af_unix test with both sockets in map")
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-5-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Do actually test the sotype as specified by the caller.
This picks up after commit 75e0e27db6cf ("selftest/bpf: Change udp to inet
in some function names").
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-4-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Replace implementation with a call to a generic function.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-3-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Following create_pair() changes, remove unused function argument in
create_socket_pairs() and adapt its callers, i.e. drop the open-coded
loopback socket creation.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-2-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Extend the function to allow creating socket pairs of SOCK_STREAM,
SOCK_DGRAM and SOCK_SEQPACKET.
Adapt direct callers and leave further cleanups for the following patch.
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Tested-by: Jakub Sitnicki <jakub@cloudflare.com>
Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://lore.kernel.org/r/20240731-selftest-sockmap-fixes-v2-1-08a0c73abed2@rbox.co
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- memory corruption fixes for hid-cougar (Camila Alvarez) and
hid-amd_sfh (Olivier Sobrie)
- fix for regression in Wacom driver of twist gesture handling (Jason
Gerecke)
- two new device IDs for hid-multitouch (Dmitry Savin) and hid-asus
(Luke D. Jones)
* tag 'hid-for-linus-2024081901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: Defer calculation of resolution until resolution_code is known
HID: multitouch: Add support for GT7868Q
HID: amd_sfh: free driver_data after destroying hid device
hid-asus: add ROG Ally X prod ID to quirk list
HID: cougar: fix slab-out-of-bounds Read in cougar_report_fixup
|
|
This patch is to move nf_ct_netns_get() out of nf_conncount_init()
and let the consumers of nf_conncount decide if they want to turn
on netfilter conntrack.
It makes nf_conncount more flexible to be used in other places and
avoids netfilter conntrack turned on when using it in openvswitch
conntrack.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
pipapo set backend maintains two copies of the datastructure, removing
the elements from the copy that is going to be discarded slows down
the abort path significantly, from several minutes to few seconds after
this patch.
This patch was previously reverted by
f86fb94011ae ("netfilter: nf_tables: revert do not remove elements if set backend implements .abort")
but it is now possible since recent work by Florian Westphal to perform
on-demand clone from insert/remove path:
532aec7e878b ("netfilter: nft_set_pipapo: remove dirty flag")
3f1d886cc7c3 ("netfilter: nft_set_pipapo: move cloning of match info to insert/removal path")
a238106703ab ("netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone")
c5444786d0ea ("netfilter: nft_set_pipapo: merge deactivate helper into caller")
6c108d9bee44 ("netfilter: nft_set_pipapo: prepare walk function for on-demand clone")
8b8a2417558c ("netfilter: nft_set_pipapo: prepare destroy function for on-demand clone")
80efd2997fb9 ("netfilter: nft_set_pipapo: make pipapo_clone helper return NULL")
a590f4760922 ("netfilter: nft_set_pipapo: move prove_locking helper around")
after this series, the clone is fully released once aborted, no need to
take it back to previous state. Thus, no stale reference to elements can
occur.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|