Age | Commit message (Collapse) | Author |
|
xs_tcp_tls_finish_connecting() already marks the upper xprt
connected, so the same code in xs_tcp_tls_setup_socket() is
never executed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
Engineers at Hammerspace noticed that sometimes mounting with
"xprtsec=tls" hangs for a minute or so, and then times out, even
when the NFS server is reachable and responsive.
kTLS shuts off data_ready callbacks if strp->msg_ready is set to
mitigate data_ready callbacks when a full TLS record is not yet
ready to be read from the socket.
Normally msg_ready is clear when the first TLS record arrives on
a socket. However, I observed that sometimes tls_setsockopt() sets
strp->msg_ready, and that prevents forward progress because
tls_data_ready() becomes a no-op.
Moreover, Jakub says: "If there's a full record queued at the time
when [tlshd] passes the socket back to the kernel, it's up to the
reader to read the already queued data out." So SunRPC cannot
expect a data_ready call when ingress data is already waiting.
Add an explicit poll after SunRPC's upper transport is set up to
pick up any data that arrived after the TLS handshake but before
transport set-up is complete.
Reported-by: Steve Sears <sjs@hammerspace.com>
Suggested-by: Jakub Kacinski <kuba@kernel.org>
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Tested-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.
We didn't get netfilter or wireless PRs this week, so next week's PR
is probably going to be bigger. A healthy dose of fixes for bugs
introduced in the current release nonetheless.
Current release - regressions:
- Bluetooth: always allow SCO packets for user channel
- af_unix: fix memory leak in unix_dgram_sendmsg()
- rxrpc:
- remove redundant peer->mtu_lock causing lockdep splats
- fix spinlock flavor issues with the peer record hash
- eth: iavf: fix circular lock dependency with netdev_lock
- net: use rtnl_net_dev_lock() in
register_netdevice_notifier_dev_net() RDMA driver register notifier
after the device
Current release - new code bugs:
- ethtool: fix ioctl confusing drivers about desired HDS user config
- eth: ixgbe: fix media cage present detection for E610 device
Previous releases - regressions:
- loopback: avoid sending IP packets without an Ethernet header
- mptcp: reset connection when MPTCP opts are dropped after join
Previous releases - always broken:
- net: better track kernel sockets lifetime
- ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels
- phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT
- eth: enetc: number of error handling fixes
- dsa: rtl8366rb: reshuffle the code to fix config / build issue with
LED support"
* tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
net: ti: icss-iep: Reject perout generation request
idpf: fix checksums set in idpf_rx_rsc()
selftests: drv-net: Check if combined-count exists
net: ipv6: fix dst ref loop on input in rpl lwt
net: ipv6: fix dst ref loop on input in seg6 lwt
usbnet: gl620a: fix endpoint checking in genelink_bind()
net/mlx5: IRQ, Fix null string in debug print
net/mlx5: Restore missing trace event when enabling vport QoS
net/mlx5: Fix vport QoS cleanup on error
net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
af_unix: Fix memory leak in unix_dgram_sendmsg()
net: Handle napi_schedule() calls from non-interrupt
net: Clear old fragment checksum value in napi_reuse_skb
gve: unlink old napi when stopping a queue using queue API
net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().
tcp: Defer ts_recent changes until req is owned
net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
net: enetc: remove the mm_lock from the ENETC v4 driver
net: enetc: add missing enetc4_link_deinit()
net: enetc: update UDP checksum when updating originTimestamp field
...
|
|
While kernel sockets are dismantled during pernet_operations->exit(),
their freeing can be delayed by any tx packets still held in qdisc
or device queues, due to skb_set_owner_w() prior calls.
This then trigger the following warning from ref_tracker_dir_exit() [1]
To fix this, make sure that kernel sockets own a reference on net->passive.
Add sk_net_refcnt_upgrade() helper, used whenever a kernel socket
is converted to a refcounted one.
[1]
[ 136.263918][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.263918][ T35] sk_alloc+0x2b3/0x370
[ 136.263918][ T35] inet6_create+0x6ce/0x10f0
[ 136.263918][ T35] __sock_create+0x4c0/0xa30
[ 136.263918][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.263918][ T35] igmp6_net_init+0x39/0x390
[ 136.263918][ T35] ops_init+0x31e/0x590
[ 136.263918][ T35] setup_net+0x287/0x9e0
[ 136.263918][ T35] copy_net_ns+0x33f/0x570
[ 136.263918][ T35] create_new_namespaces+0x425/0x7b0
[ 136.263918][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.263918][ T35] ksys_unshare+0x57d/0xa70
[ 136.263918][ T35] __x64_sys_unshare+0x38/0x40
[ 136.263918][ T35] do_syscall_64+0xf3/0x230
[ 136.263918][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 136.263918][ T35]
[ 136.343488][ T35] ref_tracker: net notrefcnt@ffff8880638f01e0 has 1/2 users at
[ 136.343488][ T35] sk_alloc+0x2b3/0x370
[ 136.343488][ T35] inet6_create+0x6ce/0x10f0
[ 136.343488][ T35] __sock_create+0x4c0/0xa30
[ 136.343488][ T35] inet_ctl_sock_create+0xc2/0x250
[ 136.343488][ T35] ndisc_net_init+0xa7/0x2b0
[ 136.343488][ T35] ops_init+0x31e/0x590
[ 136.343488][ T35] setup_net+0x287/0x9e0
[ 136.343488][ T35] copy_net_ns+0x33f/0x570
[ 136.343488][ T35] create_new_namespaces+0x425/0x7b0
[ 136.343488][ T35] unshare_nsproxy_namespaces+0x124/0x180
[ 136.343488][ T35] ksys_unshare+0x57d/0xa70
[ 136.343488][ T35] __x64_sys_unshare+0x38/0x40
[ 136.343488][ T35] do_syscall_64+0xf3/0x230
[ 136.343488][ T35] entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 0cafd77dcd03 ("net: add a refcount tracker for kernel sockets")
Reported-by: syzbot+30a19e01a97420719891@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/67b72aeb.050a0220.14d86d.0283.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://patch.msgid.link/20250220131854.4048077-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the TLS handshake attempt returns -ETIMEDOUT, we currently translate
that error into -EACCES. This becomes problematic for cases where the RPC
layer is attempting to re-connect in paths that don't resonably handle
-EACCES, for example: writeback. The RPC layer can handle -ETIMEDOUT quite
well, however - so if the handshake returns this error let's just pass it
along.
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0
Read of size 1 at addr ffff888111f322cd by task swapper/0/0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
Call Trace:
<IRQ>
dump_stack_lvl+0x68/0xa0
print_address_description.constprop.0+0x2c/0x3d0
print_report+0xb4/0x270
kasan_report+0xbd/0xf0
tcp_write_timer_handler+0x156/0x3e0
tcp_write_timer+0x66/0x170
call_timer_fn+0xfb/0x1d0
__run_timers+0x3f8/0x480
run_timer_softirq+0x9b/0x100
handle_softirqs+0x153/0x390
__irq_exit_rcu+0x103/0x120
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0x76/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 <fa> c3 cc cc cc
cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242
RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d
R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000
R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0
default_idle_call+0x6b/0xa0
cpuidle_idle_call+0x1af/0x1f0
do_idle+0xbc/0x130
cpu_startup_entry+0x33/0x40
rest_init+0x11f/0x210
start_kernel+0x39a/0x420
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0x97/0xa0
common_startup_64+0x13e/0x141
</TASK>
Allocated by task 595:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x87/0x90
kmem_cache_alloc_noprof+0x12b/0x3f0
copy_net_ns+0x94/0x380
create_new_namespaces+0x24c/0x500
unshare_nsproxy_namespaces+0x75/0xf0
ksys_unshare+0x24e/0x4f0
__x64_sys_unshare+0x1f/0x30
do_syscall_64+0x70/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 100:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x54/0x70
kmem_cache_free+0x156/0x5d0
cleanup_net+0x5d3/0x670
process_one_work+0x776/0xa90
worker_thread+0x2e2/0x560
kthread+0x1a8/0x1f0
ret_from_fork+0x34/0x60
ret_from_fork_asm+0x1a/0x30
Reproduction script:
mkdir -p /mnt/nfsshare
mkdir -p /mnt/nfs/netns_1
mkfs.ext4 /dev/sdb
mount /dev/sdb /mnt/nfsshare
systemctl restart nfs-server
chmod 777 /mnt/nfsshare
exportfs -i -o rw,no_root_squash *:/mnt/nfsshare
ip netns add netns_1
ip link add name veth_1_peer type veth peer veth_1
ifconfig veth_1_peer 11.11.0.254 up
ip link set veth_1 netns netns_1
ip netns exec netns_1 ifconfig veth_1 11.11.0.1
ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \
--tcp-flags FIN FIN -j DROP
(note: In my environment, a DESTROY_CLIENTID operation is always sent
immediately, breaking the nfs tcp connection.)
ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \
11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1
ip netns del netns_1
The reason here is that the tcp socket in netns_1 (nfs side) has been
shutdown and closed (done in xs_destroy), but the FIN message (with ack)
is discarded, and the nfsd side keeps sending retransmission messages.
As a result, when the tcp sock in netns_1 processes the received message,
it sends the message (FIN message) in the sending queue, and the tcp timer
is re-established. When the network namespace is deleted, the net structure
accessed by tcp's timer handler function causes problems.
To fix this problem, let's hold netns refcnt for the tcp kernel socket as
done in other modules. This is an ugly hack which can easily be backported
to earlier kernels. A proper fix which cleans up the interfaces will
follow, but may not be so easy to backport.
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We've noticed a situation where an unstable TCP connection can cause the
TLS handshake to timeout waiting for userspace to complete it. When this
happens, we don't want to return from xs_tls_handshake_sync() with zero, as
this will cause the upper xprt to be set CONNECTED, and subsequent attempts
to transmit will be returned with -EPIPE. The sunrpc machine does not
recover from this situation and will spin attempting to transmit.
The return value of tls_handshake_cancel() can be used to detect a race
with completion:
* tls_handshake_cancel - cancel a pending handshake
* Return values:
* %true - Uncompleted handshake request was canceled
* %false - Handshake request already completed or not found
If true, we do not want the upper xprt to be connected, so return
-ETIMEDOUT. If false, its possible the handshake request was lost and
that may be the reason for our timeout. Again we do not want the upper
xprt to be connected, so return -ETIMEDOUT.
Ensure that we alway return an error from xs_tls_handshake_sync() if we
call tls_handshake_cancel().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Since transport->sock has been set to NULL during reset transport,
XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
to dereference the transport->sock that has been set to NULL.
Fixes: 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
We've observed an NFS server shrink the TCP window and then reset the TCP
connection as part of a HA failover. When the connection has TLS, often
the NFS client will hang indefinitely in this stack:
wait_woken+0x70/0x80
wait_on_pending_writer+0xe4/0x110 [tls]
tls_sk_proto_close+0x368/0x3a0 [tls]
inet_release+0x54/0xb0
__sock_release+0x48/0xc8
sock_close+0x20/0x38
__fput+0xe0/0x2f0
__fput_sync+0x58/0x70
xs_reset_transport+0xe8/0x1f8 [sunrpc]
xs_tcp_shutdown+0xa4/0x190 [sunrpc]
xprt_autoclose+0x68/0x170 [sunrpc]
process_one_work+0x180/0x420
worker_thread+0x258/0x368
kthread+0x104/0x118
ret_from_fork+0x10/0x20
This hang prevents the client from closing the socket and reconnecting to
the server.
Because xs_nospace() elevates sk_write_pending, and sk_sndtimeo is
MAX_SCHEDULE_TIMEOUT, tls_sk_proto_close is never able to complete its wait
for pending writes to the socket. For this case where we are resetting the
transport anyway, we don't expect the socket to ever have write space, so
fix this by simply clearing the sock's sndtimeo under the sock's lock.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
xs_tcp_finish_connecting() can return -ENOTCONN but the switch statement
in xs_tcp_setup_socket() treats that as an unhandled error.
If we treat it as a known error it would propagate back to
call_connect_status() which does handle that error code. This appears
to be the intention of the commit (given below) which added -ENOTCONN as
a return status for xs_tcp_finish_connecting().
So add -ENOTCONN to the switch statement as an error to pass through to
the caller.
Link: https://bugzilla.suse.com/show_bug.cgi?id=1231050
Link: https://access.redhat.com/discussions/3434091
Fixes: 01d37c428ae0 ("SUNRPC: xprt_connect() don't abort the task if the transport isn't bound")
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
When using a BPF program on kernel_connect(), the call can return -EPERM. This
causes xs_tcp_setup_socket() to loop forever, filling up the syslog and causing
the kernel to potentially freeze up.
Neil suggested:
This will propagate -EPERM up into other layers which might not be ready
to handle it. It might be safer to map EPERM to an error we would be more
likely to expect from the network system - such as ECONNREFUSED or ENETDOWN.
ECONNREFUSED as error seems reasonable. For programs setting a different error
can be out of reach (see handling in 4fbac77d2d09) in particular on kernels
which do not have f10d05966196 ("bpf: Make BPF_PROG_RUN_ARRAY return -err
instead of allow boolean"), thus given that it is better to simply remap for
consistent behavior. UDP does handle EPERM in xs_udp_send_request().
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Co-developed-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Lex Siegel <usiegl00@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@kernel.org>
Cc: Anna Schumaker <anna@kernel.org>
Link: https://github.com/cilium/cilium/issues/33395
Link: https://lore.kernel.org/bpf/171374175513.12877.8993642908082014881@noble.neil.brown.name
Link: https://patch.msgid.link/9069ec1d59e4b2129fc23433349fd5580ad43921.1720075070.git.daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
* Remove sentinel element from ctl_table structs.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client fixes from Trond Myklebust:
- Fix an Oops in xs_tcp_tls_setup_socket
- Fix an Oops due to missing error handling in nfs_net_init()
* tag 'nfs-for-6.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: Handle error of rpc_proc_register() in nfs_net_init().
SUNRPC: add a missing rpc_stat for TCP TLS
|
|
Commit 1548036ef120 ("nfs: make the rpc_stat per net namespace") added
functionality to specify rpc_stats function but missed adding it to the
TCP TLS functionality. As the result, mounting with xprtsec=tls lead to
the following kernel oops.
[ 128.984192] Unable to handle kernel NULL pointer dereference at
virtual address 000000000000001c
[ 128.985058] Mem abort info:
[ 128.985372] ESR = 0x0000000096000004
[ 128.985709] EC = 0x25: DABT (current EL), IL = 32 bits
[ 128.986176] SET = 0, FnV = 0
[ 128.986521] EA = 0, S1PTW = 0
[ 128.986804] FSC = 0x04: level 0 translation fault
[ 128.987229] Data abort info:
[ 128.987597] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 128.988169] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 128.988811] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 128.989302] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000106c84000
[ 128.990048] [000000000000001c] pgd=0000000000000000, p4d=0000000000000000
[ 128.990736] Internal error: Oops: 0000000096000004 [#1] SMP
[ 128.991168] Modules linked in: nfs_layout_nfsv41_files
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace netfs
uinput dm_mod nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill
ip_set nf_tables nfnetlink qrtr vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
sunrpc vfat fat uvcvideo videobuf2_vmalloc videobuf2_memops uvc
videobuf2_v4l2 videodev videobuf2_common mc vmw_vmci xfs libcrc32c
e1000e crct10dif_ce ghash_ce sha2_ce vmwgfx nvme sha256_arm64
nvme_core sr_mod cdrom sha1_ce drm_ttm_helper ttm drm_kms_helper drm
sg fuse
[ 128.996466] CPU: 0 PID: 179 Comm: kworker/u4:26 Kdump: loaded Not
tainted 6.8.0-rc6+ #12
[ 128.997226] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS
VMW201.00V.21805430.BA64.2305221830 05/22/2023
[ 128.998084] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc]
[ 128.998701] pstate: 81400005 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 128.999384] pc : call_start+0x74/0x138 [sunrpc]
[ 128.999809] lr : __rpc_execute+0xb8/0x3e0 [sunrpc]
[ 129.000244] sp : ffff8000832b3a00
[ 129.000508] x29: ffff8000832b3a00 x28: ffff800081ac79c0 x27: ffff800081ac7000
[ 129.001111] x26: 0000000004248060 x25: 0000000000000000 x24: ffff800081596008
[ 129.001757] x23: ffff80007b087240 x22: ffff00009a509d30 x21: 0000000000000000
[ 129.002345] x20: ffff000090075600 x19: ffff00009a509d00 x18: ffffffffffffffff
[ 129.002912] x17: 733d4d4554535953 x16: 42555300312d746e x15: ffff8000832b3a88
[ 129.003464] x14: ffffffffffffffff x13: ffff8000832b3a7d x12: 0000000000000008
[ 129.004021] x11: 0101010101010101 x10: ffff8000150cb560 x9 : ffff80007b087c00
[ 129.004577] x8 : ffff00009a509de0 x7 : 0000000000000000 x6 : 00000000be8c4ee3
[ 129.005026] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff000094d56680
[ 129.005425] x2 : ffff80007b0637f8 x1 : ffff000090075600 x0 : ffff00009a509d00
[ 129.005824] Call trace:
[ 129.005967] call_start+0x74/0x138 [sunrpc]
[ 129.006233] __rpc_execute+0xb8/0x3e0 [sunrpc]
[ 129.006506] rpc_execute+0x160/0x1d8 [sunrpc]
[ 129.006778] rpc_run_task+0x148/0x1f8 [sunrpc]
[ 129.007204] tls_probe+0x80/0xd0 [sunrpc]
[ 129.007460] rpc_ping+0x28/0x80 [sunrpc]
[ 129.007715] rpc_create_xprt+0x134/0x1a0 [sunrpc]
[ 129.007999] rpc_create+0x128/0x2a0 [sunrpc]
[ 129.008264] xs_tcp_tls_setup_socket+0xdc/0x508 [sunrpc]
[ 129.008583] process_one_work+0x174/0x3c8
[ 129.008813] worker_thread+0x2c8/0x3e0
[ 129.009033] kthread+0x100/0x110
[ 129.009225] ret_from_fork+0x10/0x20
[ 129.009432] Code: f0ffffc2 911fe042 aa1403e1 aa1303e0 (b9401c83)
Fixes: 1548036ef120 ("nfs: make the rpc_stat per net namespace")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Bugfixes:
- Fix for an Oops in the NFSv4.2 listxattr handler
- Correct an incorrect buffer size in listxattr
- Fix for an Oops in the pNFS flexfiles layout
- Fix a refcount leak in NFS O_DIRECT writes
- Fix missing locking in NFS O_DIRECT
- Avoid an infinite loop in pnfs_update_layout
- Fix an overflow in the RPC waitqueue queue length counter
- Ensure that pNFS I/O is also protected by TLS when xprtsec is
specified by the mount options
- Fix a leaked folio lock in the netfs read code
- Fix a potential deadlock in fscache
- Allow setting the fscache uniquifier in NFSv4
- Fix an off by one in root_nfs_cat()
- Fix another off by one in rpc_sockaddr2uaddr()
- nfs4_do_open() can incorrectly trigger state recovery
- Various fixes for connection shutdown
Features and cleanups:
- Ensure that containers only see their own RPC and NFS stats
- Enable nconnect for RDMA
- Remove dead code from nfs_writepage_locked()
- Various tracepoint additions to track EXCHANGE_ID, GETDEVICEINFO,
and mount options"
* tag 'nfs-for-6.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
nfs: fix panic when nfs4_ff_layout_prepare_ds() fails
NFS: trace the uniquifier of fscache
NFS: Read unlock folio on nfs_page_create_from_folio() error
NFS: remove unused variable nfs_rpcstat
nfs: fix UAF in direct writes
nfs: properly protect nfs_direct_req fields
NFS: enable nconnect for RDMA
NFSv4: nfs4_do_open() is incorrectly triggering state recovery
NFS: avoid infinite loop in pnfs_update_layout.
NFS: remove sync_mode test from nfs_writepage_locked()
NFSv4.1/pnfs: fix NFS with TLS in pnfs
NFS: Fix an off by one in root_nfs_cat()
nfs: make the rpc_stat per net namespace
nfs: expose /proc/net/sunrpc/nfs in net namespaces
sunrpc: add a struct rpc_stats arg to rpc_create_args
nfs: remove unused NFS_CALL macro
NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls
NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt
SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int
nfs: fix regression in handling of fsc= option in NFSv4
...
|
|
bc_close() and bc_destroy now do something, so the comments are
no longer correct. Commit 6221f1d9b63f ("SUNRPC: Fix backchannel
RPC soft lockups") should have removed these.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Add a transport level callback to allow it to handle the consequences of
dequeuing the request that was in the process of being transmitted.
For something like a TCP connection, we may need to disconnect if the
request was partially transmitted.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the TCP connection attempt fails without ever establishing a
connection, then assume the problem may be the server is rejecting us
due to port reuse.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Fix up xs_wake_error() to close the socket when a hard error is being
reported. Usually, that means an ECONNRESET was received on a connection
attempt.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Otherwise we run the risk of having the lower_xprt freed from underneath
us, causing an oops that looks like this:
[ 224.150698] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 224.150951] #PF: supervisor read access in kernel mode
[ 224.151117] #PF: error_code(0x0000) - not-present page
[ 224.151278] PGD 0 P4D 0
[ 224.151361] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 224.151499] CPU: 2 PID: 99 Comm: kworker/u10:6 Not tainted 6.6.0-rc3-g6465e260f487 #41264 a00b0960990fb7bc6d6a330ee03588b67f08a47b
[ 224.151977] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 2/2/2022
[ 224.152216] Workqueue: xprtiod xs_tcp_tls_setup_socket [sunrpc]
[ 224.152434] RIP: 0010:xs_tcp_tls_setup_socket+0x3cc/0x7e0 [sunrpc]
[ 224.152643] Code: 00 00 48 8b 7c 24 08 e9 f3 01 00 00 48 83 7b c0 00 0f 85 d2 01 00 00 49 8d 84 24 f8 05 00 00 48 89 44 24 10 48 8b 00 48 89 c5 <4c> 8b 68 18 66 41 83 3f 0a 75 71 45 31 ff 4c 89 ef 31 f6 e8 5c 76
[ 224.153246] RSP: 0018:ffffb00ec060fd18 EFLAGS: 00010246
[ 224.153427] RAX: 0000000000000000 RBX: ffff8c06c2e53e40 RCX: 0000000000000001
[ 224.153652] RDX: ffff8c073bca2408 RSI: 0000000000000282 RDI: ffff8c06c259ee00
[ 224.153868] RBP: 0000000000000000 R08: ffffffff9da55aa0 R09: 0000000000000001
[ 224.154084] R10: 00000034306c30f1 R11: 0000000000000002 R12: ffff8c06c2e51800
[ 224.154300] R13: ffff8c06c355d400 R14: 0000000004208160 R15: ffff8c06c2e53820
[ 224.154521] FS: 0000000000000000(0000) GS:ffff8c073bd00000(0000) knlGS:0000000000000000
[ 224.154763] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 224.154940] CR2: 0000000000000018 CR3: 0000000062c1e000 CR4: 0000000000750ee0
[ 224.155157] PKRU: 55555554
[ 224.155244] Call Trace:
[ 224.155325] <TASK>
[ 224.155395] ? __die_body+0x68/0xb0
[ 224.155507] ? page_fault_oops+0x34c/0x3a0
[ 224.155635] ? _raw_spin_unlock_irqrestore+0xe/0x40
[ 224.155793] ? exc_page_fault+0x7a/0x1b0
[ 224.155916] ? asm_exc_page_fault+0x26/0x30
[ 224.156047] ? xs_tcp_tls_setup_socket+0x3cc/0x7e0 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8]
[ 224.156367] ? xs_tcp_tls_setup_socket+0x2fe/0x7e0 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8]
[ 224.156697] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc ae3a15912ae37fd51dafbdbc2dbd069117f8f5c8]
[ 224.157013] process_scheduled_works+0x24e/0x450
[ 224.157158] worker_thread+0x21c/0x2d0
[ 224.157275] ? __pfx_worker_thread+0x10/0x10
[ 224.157409] kthread+0xe8/0x110
[ 224.157510] ? __pfx_kthread+0x10/0x10
[ 224.157628] ret_from_fork+0x37/0x50
[ 224.157741] ? __pfx_kthread+0x10/0x10
[ 224.157859] ret_from_fork_asm+0x1b/0x30
[ 224.157983] </TASK>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable the NFS v4.2 READ_PLUS operation by default
Stable Fixes:
- NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
- NFS: Fix a potential data corruption
Bugfixes:
- Fix various READ_PLUS issues including:
- smatch warnings
- xdr size calculations
- scratch buffer handling
- 32bit / highmem xdr page handling
- Fix checkpatch errors in file.c
- Fix redundant readdir request after an EOF
- Fix handling of COPY ERR_OFFLOAD_NO_REQ
- Fix assignment of xprtdata.cred
Cleanups:
- Remove unused xprtrdma function declarations
- Clean up an integer overflow check to avoid a warning
- Clean up #includes in dns_resolve.c
- Clean up nfs4_get_device_info so we don't pass a NULL pointer
to __free_page()
- Clean up sunrpc TCP socket timeout configuration
- Guard against READDIR loops when entry names are too long
- Use EXCHID4_FLAG_USE_PNFS_DS for DS servers"
* tag 'nfs-for-6.6-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (22 commits)
pNFS: Fix assignment of xprtdata.cred
NFSv4.2: fix handling of COPY ERR_OFFLOAD_NO_REQ
NFS: Guard against READDIR loop when entry names exceed MAXNAMELEN
NFSv4.1: use EXCHGID4_FLAG_USE_PNFS_DS for DS server
NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt()
SUNRPC: Allow specification of TCP client connect timeout at setup
SUNRPC: Refactor and simplify connect timeout
SUNRPC: Set the TCP_SYNCNT to match the socket timeout
NFS: Fix a potential data corruption
nfs: fix redundant readdir request after get eof
nfs/blocklayout: Use the passed in gfp flags
filemap: Fix errors in file.c
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
NFS: Move common includes outside ifdef
SUNRPC: clean up integer overflow check
xprtrdma: Remove unused function declaration rpcrdma_bc_post_recv()
NFS: Enable the READ_PLUS operation by default
SUNRPC: kmap() the xdr pages during decode
NFSv4.2: Rework scratch handling for READ_PLUS (again)
...
|
|
When we create a TCP transport, the connect timeout parameters are
currently fixed to be 90s. This is problematic in the pNFS flexfiles
case, where we may have multiple mirrors, and we would like to fail over
quickly to the next mirror if a data server is down.
This patch adds the ability to specify the connection parameters at RPC
client creation time.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Instead of requiring the requests to redrive the connection several
times, just let the TCP connect code manage it now that we've adjusted
the TCP_SYNCNT value.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Set the TCP SYN count so that we abort the connection attempt at around
the expected timeout value.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Use the helpers to parse the level and description fields in
incoming alerts. "Warning" alerts are discarded, and "fatal"
alerts mean the session is no longer valid.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047944747.5241.1974889594004407123.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Before closing a TCP connection, the TLS protocol wants peers to
send session close Alert notifications. Add those in both the RPC
client and server.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047939404.5241.14392506226409865832.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Kernel TLS consumers will need definitions of various parts of the
TLS protocol, but often do not need the function declarations and
other infrastructure provided in <net/tls.h>.
Break out existing standardized protocol elements into a separate
header, and make room for a few more elements in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://lore.kernel.org/r/169047931374.5241.7713175865185969309.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use the new TLS handshake API to enable the SunRPC client code
to request a TLS handshake. This implements support for RFC 9289,
only on TCP sockets.
Upper layers such as NFS use RPC-with-TLS to protect in-transit
traffic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
kTLS sockets use CMSG to report decryption errors and the need
for session re-keying.
For RPC-with-TLS, an "application data" message contains a ULP
payload, and that is passed along to the RPC client. An "alert"
message triggers connection reset. Everything else is discarded.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The RPC header parser doesn't recognize TLS handshake traffic, so it
will close the connection prematurely with an error. To avoid that,
shunt the transport's data_ready callback when there is a TLS
handshake in progress.
The XPRT_SOCK_IGNORE_RECV flag will be toggled by code added in a
subsequent patch.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
An "abtract" address for an AF_UNIX socket start with a nul and can
contain any bytes for the given length, but traditionally doesn't
contain other nuls. When reported, the leading nul is replaced by '@'.
sunrpc currently rejects connections to these addresses and reports them
as an empty string. To provide support for future use of these
addresses, allow them for outgoing connections and report them more
usefully.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
There is no need to declare an extra tables to just create directory,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
NFS server Duplicate Request Cache (DRC) algorithms rely on NFS clients
reconnecting using the same local TCP port. Unique NFS operations are
identified by the per-TCP connection set of XIDs. This prevents file
corruption when non-idempotent NFS operations are retried.
Currently, NFS client TCP connections are using different local TCP ports
when reconnecting to NFS servers.
After an NFS server initiates shutdown of the TCP connection, the NFS
client's TCP socket is set to NULL after the socket state has reached
TCP_LAST_ACK(9). When reconnecting, the new socket attempts to reuse
the same local port but fails with EADDRNOTAVAIL (99). This forces the
socket to use a different local TCP port to reconnect to the remote NFS
server.
State Transition and Events:
TCP_CLOSE_WAIT(8)
TCP_LAST_ACK(9)
connect(fail EADDRNOTAVAIL(99))
TCP_CLOSE(7)
bind on new port
connect success
dmesg excerpts showing reconnect switching from TCP local port of 926 to
763 after commit 7c81e6a9d75b:
[13354.947854] NFS call mkdir testW
...
[13405.654781] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654813] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
[13405.654826] RPC: xs_data_ready...
[13405.654892] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654895] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.654899] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.654900] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.654950] RPC: xs_connect scheduled xprt 00000000037d0f03
[13405.654975] RPC: xs_bind 0.0.0.0:926: ok (0)
[13405.654980] RPC: worker connecting xprt 00000000037d0f03 via tcp
to 10.101.6.228 (port 2049)
[13405.654991] RPC: 00000000037d0f03 connect status 99 connected 0
sock state 7
[13405.655001] RPC: xs_tcp_state_change client 00000000037d0f03...
[13405.655002] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[13405.655024] RPC: xs_connect scheduled xprt 00000000037d0f03
[13405.655038] RPC: xs_bind 0.0.0.0:763: ok (0)
[13405.655041] RPC: worker connecting xprt 00000000037d0f03 via tcp
to 10.101.6.228 (port 2049)
[13405.655065] RPC: 00000000037d0f03 connect status 115 connected 0
sock state 2
State Transition and Events with patch applied:
TCP_CLOSE_WAIT(8)
TCP_LAST_ACK(9)
TCP_CLOSE(7)
connect(reuse of port succeeds)
dmesg excerpts showing reconnect on same TCP local port of 936 with patch
applied:
[ 257.139935] NFS: mkdir(0:59/560857152), testQ
[ 257.139937] NFS call mkdir testQ
...
[ 307.822702] RPC: state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
[ 307.822714] RPC: xs_data_ready...
[ 307.822817] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.822821] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.822825] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.822826] RPC: state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823606] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.823609] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823629] RPC: xs_tcp_state_change client 00000000ce702f14...
[ 307.823632] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 307.823676] RPC: xs_connect scheduled xprt 00000000ce702f14
[ 307.823704] RPC: xs_bind 0.0.0.0:936: ok (0)
[ 307.823709] RPC: worker connecting xprt 00000000ce702f14 via tcp
to 10.101.1.30 (port 2049)
[ 307.823748] RPC: 00000000ce702f14 connect status 115 connected 0
sock state 2
...
[ 314.916193] RPC: state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
[ 314.916251] RPC: xs_connect scheduled xprt 00000000ce702f14
[ 314.916282] RPC: xs_bind 0.0.0.0:936: ok (0)
[ 314.916292] RPC: worker connecting xprt 00000000ce702f14 via tcp
to 10.101.1.30 (port 2049)
[ 314.916342] RPC: 00000000ce702f14 connect status 115 connected 0
sock state 2
Fixes: 7c81e6a9d75b ("SUNRPC: Tweak TCP socket shutdown in the RPC client")
Signed-off-by: Siddharth Rajendra Kawar <sikawar@microsoft.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready()
callback implementations. For example:
<...>
iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable
iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable
<...>
Suggested-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag. The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.
The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code. I believe this problem to
be much more pervasive than reports to the community may indicate.
Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false. Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.
CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
"iov_iter work; most of that is about getting rid of direction
misannotations and (hopefully) preventing more of the same for the
future"
* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use less confusing names for iov_iter direction initializers
iov_iter: saner checks for attempt to copy to/from iterator
[xen] fix "direction" argument of iov_iter_kvec()
[vhost] fix 'direction' argument of iov_iter_{init,bvec}()
[target] fix iov_iter_bvec() "direction" argument
[s390] memcpy_real(): WRITE is "data source", not destination...
[s390] zcore: WRITE is "data source", not destination...
[infiniband] READ is "data destination", not source...
[fsi] WRITE is "data source", not destination...
[s390] copy_oldmem_kernel() - WRITE is "data source", not destination
csum_and_copy_to_iter(): handle ITER_DISCARD
get rid of unlikely() on page_copy_sane() calls
|
|
READ/WRITE proved to be actively confusing - the meanings are
"data destination, as used with read(2)" and "data source, as
used with write(2)", but people keep interpreting those as
"we read data from it" and "we write data to it", i.e. exactly
the wrong way.
Call them ITER_DEST and ITER_SOURCE - at least that is harder
to misinterpret...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This is a simple mechanical transformation done by:
@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
|
|
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:
@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)
@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@
- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);
// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@
((T)get_random_u32()@p & (LITERAL))
// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@
value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@
- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)
@collapse_ret@
type T;
identifier VAR;
expression E;
@@
{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}
@drop_var@
type T;
identifier VAR;
@@
{
- T VAR;
... when != VAR
}
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: KP Singh <kpsingh@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Ensure that we immediately call rpc_exit_task() after waking up, and
that the tk_rpc_status cannot get clobbered by some other function.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
If a request is re-encoded and then retransmitted, we need to make sure
that we also re-encode the bvec, in case the page lists have changed.
Fixes: ff053dbbaffe ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
No conflicts.
Build issue in drivers/net/ethernet/sfc/ptp.c
54fccfdd7c66 ("sfc: efx_default_channel_type APIs can be static")
49e6123c65da ("net: sfc: fix memory leak due to ptp channel")
https://lore.kernel.org/all/20220510130556.52598fe2@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit 7073ea8799a8cf73db60270986f14e4aae20fa80.
We must not try to connect the socket while the transport is under
construction, because the mechanisms to safely tear it down are not in
place. As the code stands, we end up leaking the sockets on a connection
error.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When the rpcbind server closes the socket, we need to ensure that the
socket is closed by the kernel as soon as feasible, so add a
sk_state_change callback to trigger this close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If there is still a closed socket associated with the transport, then we
need to trigger an autoclose before we can set up a new connection.
Reported-by: wanghai (M) <wanghai38@huawei.com>
Fixes: f00432063db1 ("SUNRPC: Ensure we flush any closed sockets before xs_xprt_free()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
|
|
The internal recvmsg() functions have two parameters 'flags' and 'noblock'
that were merged inside skb_recv_datagram(). As a follow up patch to commit
f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()")
this patch removes the separate 'noblock' parameter for recvmsg().
Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
'noblock' parameters are unnecessarily split up with e.g.
err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
or in
err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
sk, msg, size, flags & MSG_DONTWAIT,
flags & ~MSG_DONTWAIT, &addr_len);
instead of simply using only flags all the time and check for MSG_DONTWAIT
where needed (to preserve for the formerly separated no(n)block condition).
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|