summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-07net: switchdev: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: watchdog: add net device refcount trackerEric Dumazet
Add a netdevice_tracker inside struct net_device, to track the self reference when a device has an active watchdog timer. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: bridge: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07vlan: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net: eql: add net device refcount trackerEric Dumazet
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07MAINTAINERS: net: mlxsw: Remove Jiri as a maintainer, add myselfPetr Machata
Jiri has moved on and will not carry out the mlxsw maintainership duty any longer. Add myself as a co-maintainer instead. Signed-off-by: Petr Machata <petrm@nvidia.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/45b54312cdebaf65c5d110b15a5dd2df795bf2be.1638807297.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07Merge branch 'net-tls-cover-all-ciphers-with-tests'Jakub Kicinski
Vadim Fedorenko says: ==================== net: tls: cover all ciphers with tests Recent patches to Kernel TLS showed that some ciphers are not covered with tests. Let's cover missed. ==================== Link: https://lore.kernel.org/r/20211206213932.7508-1-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07selftests: tls: add missing AES256-GCM cipherVadim Fedorenko
Add tests for TLSv1.2 and TLSv1.3 with AES256-GCM cipher Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07selftests: tls: add missing AES-CCM cipher testsVadim Fedorenko
Add tests for TLSv1.2 and TLSv1.3 with AES-CCM cipher. Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08netfilter: conntrack: annotate data-races around ct->timeoutEric Dumazet
(struct nf_conn)->timeout can be read/written locklessly, add READ_ONCE()/WRITE_ONCE() to prevent load/store tearing. BUG: KCSAN: data-race in __nf_conntrack_alloc / __nf_conntrack_find_get write to 0xffff888132e78c08 of 4 bytes by task 6029 on cpu 0: __nf_conntrack_alloc+0x158/0x280 net/netfilter/nf_conntrack_core.c:1563 init_conntrack+0x1da/0xb30 net/netfilter/nf_conntrack_core.c:1635 resolve_normal_ct+0x502/0x610 net/netfilter/nf_conntrack_core.c:1746 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline] tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680 __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864 tcp_push_pending_frames include/net/tcp.h:1897 [inline] tcp_data_snd_check+0x62/0x2e0 net/ipv4/tcp_input.c:5452 tcp_rcv_established+0x880/0x10e0 net/ipv4/tcp_input.c:5947 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 sk_stream_wait_memory+0x435/0x700 net/core/stream.c:145 tcp_sendmsg_locked+0xb85/0x25a0 net/ipv4/tcp.c:1402 tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1440 inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:644 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg net/socket.c:724 [inline] __sys_sendto+0x21e/0x2c0 net/socket.c:2036 __do_sys_sendto net/socket.c:2048 [inline] __se_sys_sendto net/socket.c:2044 [inline] __x64_sys_sendto+0x74/0x90 net/socket.c:2044 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888132e78c08 of 4 bytes by task 17446 on cpu 1: nf_ct_is_expired include/net/netfilter/nf_conntrack.h:286 [inline] ____nf_conntrack_find net/netfilter/nf_conntrack_core.c:776 [inline] __nf_conntrack_find_get+0x1c7/0xac0 net/netfilter/nf_conntrack_core.c:807 resolve_normal_ct+0x273/0x610 net/netfilter/nf_conntrack_core.c:1734 nf_conntrack_in+0x1c5/0x88f net/netfilter/nf_conntrack_core.c:1901 ipv6_conntrack_local+0x19/0x20 net/netfilter/nf_conntrack_proto.c:414 nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline] nf_hook_slow+0x72/0x170 net/netfilter/core.c:619 nf_hook include/linux/netfilter.h:262 [inline] NF_HOOK include/linux/netfilter.h:305 [inline] ip6_xmit+0xa3a/0xa60 net/ipv6/ip6_output.c:324 inet6_csk_xmit+0x1a2/0x1e0 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x132a/0x1840 net/ipv4/tcp_output.c:1402 __tcp_send_ack+0x1fd/0x300 net/ipv4/tcp_output.c:3956 tcp_send_ack+0x23/0x30 net/ipv4/tcp_output.c:3962 __tcp_ack_snd_check+0x2d8/0x510 net/ipv4/tcp_input.c:5478 tcp_ack_snd_check net/ipv4/tcp_input.c:5523 [inline] tcp_rcv_established+0x8c2/0x10e0 net/ipv4/tcp_input.c:5948 tcp_v6_do_rcv+0x36e/0xa50 net/ipv6/tcp_ipv6.c:1521 sk_backlog_rcv include/net/sock.h:1030 [inline] __release_sock+0xf2/0x270 net/core/sock.c:2768 release_sock+0x40/0x110 net/core/sock.c:3300 tcp_sendpage+0x94/0xb0 net/ipv4/tcp.c:1114 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 rds_tcp_xmit+0x376/0x5f0 net/rds/tcp_send.c:118 rds_send_xmit+0xbed/0x1500 net/rds/send.c:367 rds_send_worker+0x43/0x200 net/rds/threads.c:200 process_one_work+0x3fc/0x980 kernel/workqueue.c:2298 worker_thread+0x616/0xa70 kernel/workqueue.c:2445 kthread+0x2c7/0x2e0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 value changed: 0x00027cc2 -> 0x00000000 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 17446 Comm: kworker/u4:5 Tainted: G W 5.16.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: krdsd rds_send_worker Note: I chose an arbitrary commit for the Fixes: tag, because I do not think we need to backport this fix to very old kernels. Fixes: e37542ba111f ("netfilter: conntrack: avoid possible false sharing") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08selftests: netfilter: switch zone stress to socatFlorian Westphal
centos9 has nmap-ncat which doesn't like the '-q' option, use socat. While at it, mark test skipped if needed tools are missing. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08netfilter: nft_exthdr: break evaluation if setting TCP option failsPablo Neira Ayuso
Break rule evaluation on malformed TCP options. Fixes: 99d1712bc41c ("netfilter: exthdr: tcp option set support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08selftests: netfilter: Add correctness test for mac,net set typeStefano Brivio
The existing net,mac test didn't cover the issue recently reported by Nikita Yushchenko, where MAC addresses wouldn't match if given as first field of a concatenated set with AVX2 and 8-bit groups, because there's a different code path covering the lookup of six 8-bit groups (MAC addresses) if that's the first field. Add a similar mac,net test, with MAC address and IPv4 address swapped in the set specification. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08nft_set_pipapo: Fix bucket load in AVX2 lookup routine for six 8-bit groupsStefano Brivio
The sixth byte of packet data has to be looked up in the sixth group, not in the seventh one, even if we load the bucket data into ymm6 (and not ymm5, for convenience of tracking stalls). Without this fix, matching on a MAC address as first field of a set, if 8-bit groups are selected (due to a small set size) would fail, that is, the given MAC address would never match. Reported-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Tested-By: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-08vrf: don't run conntrack on vrf with !dflt qdiscNicolas Dichtel
After the below patch, the conntrack attached to skb is set to "notrack" in the context of vrf device, for locally generated packets. But this is true only when the default qdisc is set to the vrf device. When changing the qdisc, notrack is not set anymore. In fact, there is a shortcut in the vrf driver, when the default qdisc is set, see commit dcdd43c41e60 ("net: vrf: performance improvements for IPv4") for more details. This patch ensures that the behavior is always the same, whatever the qdisc is. To demonstrate the difference, a new test is added in conntrack_vrf.sh. Fixes: 8c9c296adfae ("vrf: run conntrack only in context of lower/physdev for locally generated packets") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-07Merge tag 'perf-tools-fixes-for-v5.16-2021-12-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix SMT detection fast read path on sysfs. - Fix memory leaks when processing feature headers in perf.data files. - Fix 'Simple expression parser' 'perf test' on arch without CPU die topology info, such as s/390. - Fix building perf with BUILD_BPF_SKEL=1. - Fix 'perf bench' by reverting "perf bench: Fix two memory leaks detected with ASan". - Fix itrace space allowed for new attributes in 'perf script'. - Fix the build feature detection fast path, that was always failing on systems with python3 development packages, speeding up the build. - Reset shadow counts before loading, fixing metrics using duration_time. - Sync more kernel headers changed by the new futex_waitv syscall: s390 and powerpc. * tag 'perf-tools-fixes-for-v5.16-2021-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf bpf_skel: Do not use typedef to avoid error on old clang perf bpf: Fix building perf with BUILD_BPF_SKEL=1 by default in more distros perf header: Fix memory leaks when processing feature headers perf test: Reset shadow counts before loading perf test: Fix 'Simple expression parser' test on arch without CPU die topology info tools build: Remove needless libpython-version feature check that breaks test-all fast path perf tools: Fix SMT detection fast read path tools headers UAPI: Sync powerpc syscall table file changed by new futex_waitv syscall perf inject: Fix itrace space allowed for new attributes tools headers UAPI: Sync s390 syscall table file changed by new futex_waitv syscall Revert "perf bench: Fix two memory leaks detected with ASan"
2021-12-07ice: fix adding different tunnelsMichal Swiatkowski
Adding filters with the same values inside for VXLAN and Geneve causes HW error, because it looks exactly the same. To choose between different type of tunnels new recipe is needed. Add storing tunnel types in creating recipes function and start checking it in finding function. Change getting open tunnels function to return port on correct tunnel type. This is needed to copy correct port to dummy packet. Block user from adding enc_dst_port via tc flower, because VXLAN and Geneve filters can be created only with destination port which was previously opened. Fixes: 8b032a55c1bd5 ("ice: low level support for tunnels") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07ice: fix choosing UDP header typeMichal Swiatkowski
In tunnels packet there can be two UDP headers: - outer which for hw should be mark as ICE_UDP_OF - inner which for hw should be mark as ICE_UDP_ILOS or as ICE_TCP_IL if inner header is of TCP type In none tunnels packet header can be: - UDP, which for hw should be mark as ICE_UDP_ILOS - TCP, which for hw should be mark as ICE_TCP_IL Change incorrect ICE_UDP_OF for none tunnel packets to ICE_UDP_ILOS. ICE_UDP_OF is incorrect for none tunnel packets and setting it leads to error from hw while adding this kind of recipe. In summary, for tunnel outer port type should always be set to ICE_UDP_OF, for none tunnel outer and tunnel inner it should always be set to ICE_UDP_ILOS. Fixes: 9e300987d4a8 ("ice: VXLAN and Geneve TC support") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07ice: ignore dropped packets during initJesse Brandeburg
If the hardware is constantly receiving unicast or broadcast packets during driver load, the device previously counted many GLV_RDPC (VSI dropped packets) events during init. This causes confusing dropped packet statistics during driver load. The dropped packets counter incrementing does stop once the driver finishes loading. Avoid this problem by baselining our statistics at the end of driver open instead of the end of probe. Fixes: cdedef59deb0 ("ice: Configure VSIs for Tx/Rx") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07ice: Fix problems with DSCP QoS implementationDave Ertman
The patch that implemented DSCP QoS implementation removed a bandwidth check that was used to check for a specific condition caused by some corner cases. This check should not of been removed. The same patch also added a check for when the DCBx state could be changed in relation to DSCP, but the check was erroneously added nested in a check for CEE mode, which made the check useless. Fix these problems by re-adding the bandwidth check and relocating the DSCP mode check earlier in the function that changes DCBx state in the driver. Fixes: 2a87bd73e50d ("ice: Add DSCP support") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07ice: rearm other interrupt cause register after enabling VFsPaul Greenwalt
The other interrupt cause register (OICR), global interrupt 0, is disabled when enabling VFs to prevent handling VFLR. If the OICR is not rearmed then the VF cannot communicate with the PF. Rearm the OICR after enabling VFs. Fixes: 916c7fdf5e93 ("ice: Separate VF VSI initialization/creation from reset flow") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07ice: fix FDIR init missing when reset VFYahui Cao
When VF is being reset, ice_reset_vf() will be called and FDIR resource should be released and initialized again. Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Signed-off-by: Yahui Cao <yahui.cao@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-07Merge branch 'mptcp-new-features-for-mptcp-sockets-and-netlink-pm'Jakub Kicinski
Mat Martineau says: ==================== mptcp: New features for MPTCP sockets and netlink PM This collection of patches adds MPTCP socket support for a few socket options, ioctls, and one ancillary data type (specifics for each are listed below). There's also a patch modifying the netlink MPTCP path manager API to allow setting the backup flag on a configured interface using the endpoint ID instead of the full IP address. Patches 1 & 2: TCP_INQ cmsg and selftests. Patches 2 & 3: SIOCINQ, OUTQ, and OUTQNSD ioctls and selftests. Patch 5: Change backup flag using endpoint ID. Patches 6 & 7: IP_TOS socket option and selftests. Patches 8-10: TCP_CORK and TCP_NODELAY socket options. Includes a tcp change to expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use by MPTCP. ==================== Link: https://lore.kernel.org/r/20211203223541.69364-1-mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: support TCP_CORK and TCP_NODELAYMaxim Galaganov
First, add cork and nodelay fields to the mptcp_sock structure so they can be used in sync_socket_options(), and fill them on setsockopt while holding the msk socket lock. Then, on setsockopt set proper tcp_sk(ssk)->nonagle values for subflows by calling __tcp_sock_set_cork() or __tcp_sock_set_nodelay() on the ssk while holding the ssk socket lock. tcp_push_pending_frames() will be invoked on the ssk if a cork was cleared or nodelay was set. Also set MPTCP_PUSH_PENDING bit by calling mptcp_check_and_set_pending(). This will lead to __mptcp_push_pending() being called inside mptcp_release_cb() with new tcp_sk(ssk)->nonagle. Also add getsockopt support for TCP_CORK and TCP_NODELAY. Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: expose mptcp_check_and_set_pendingMaxim Galaganov
Expose the mptcp_check_and_set_pending() function for use inside MPTCP sockopt code. The next patch will call it when TCP_CORK is cleared or TCP_NODELAY is set on the MPTCP socket in order to push pending data from mptcp_release_cb(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07tcp: expose __tcp_sock_set_cork and __tcp_sock_set_nodelayMaxim Galaganov
Expose __tcp_sock_set_cork() and __tcp_sock_set_nodelay() for use in MPTCP setsockopt code -- namely for syncing MPTCP socket options with subflows inside sync_socket_options() while already holding the subflow socket lock. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Maxim Galaganov <max@internet.ru> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07selftests: mptcp: check IP_TOS in/out are the sameFlorian Westphal
Check that getsockopt(IP_TOS) returns what setsockopt(IP_TOS) did set right before. Also check that socklen_t == 0 and -1 input values match those of normal tcp sockets. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: getsockopt: add support for IP_TOSFlorian Westphal
earlier patch added IP_TOS setsockopt support, this allows to get the value set by earlier setsockopt. Extends mptcp_put_int_option to handle u8 input/output by adding required cast. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: allow changing the "backup" bit by endpoint idDavide Caratti
a non-zero 'id' is sufficient to identify MPTCP endpoints: allow changing the value of 'backup' bit by simply specifying the endpoint id. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/158 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07selftests: mptcp: add inq test caseFlorian Westphal
client & server use a unix socket connection to communicate outside of the mptcp connection. This allows the consumer to know in advance how many bytes have been (or will be) sent by the peer. This allows stricter checks on the bytecounts reported by TCP_INQ cmsg. Suggested-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: add SIOCINQ, OUTQ and OUTQNSD ioctlsFlorian Westphal
Allows to query in-sequence data ready for read(), total bytes in write queue and total bytes in write queue that have not yet been sent. v2: remove unneeded READ_ONCE() (Paolo Abeni) v3: check for new data unconditionally in SIOCINQ ioctl (Mat Martineau) Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07selftests: mptcp: add TCP_INQ supportFlorian Westphal
Do checks on the returned inq counter. Fail on: 1. Huge value (> 1 kbyte, test case files are 1 kb) 2. last hint larger than returned bytes when read was short 3. erronenous indication of EOF. 3) happens when a hint of X bytes reads X-1 on next call but next recvmsg returns more data (instead of EOF). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07mptcp: add TCP_INQ cmsg supportFlorian Westphal
Support the TCP_INQ setsockopt. This is a boolean that tells recvmsg path to include the remaining in-sequence bytes in the cmsg data. v2: do not use CB(skb)->offset, increment map_seq instead (Paolo Abeni) v3: adjust CB(skb)->map_seq when taking skb from ofo queue (Paolo Abeni) Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/224 Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07vrf: use dev_replace_track() for better trackingEric Dumazet
vrf_rt6_release() and vrf_rtable_release() changes dst->dev Instead of dev_hold(ndev); dev_put(odev); We should use dev_replace_track(odev, ndev, &dst->dev_tracker, GFP_KERNEL); If we do not transfer dst->dev_tracker to the new device, we will get warnings from ref_tracker_dir_exit() when odev is finally dismantled. Fixes: 9038c320001d ("net: dst: add net device refcount tracking to dst_entry") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20211207055603.1926372-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07net/qla3xxx: fix an error code in ql_adapter_up()Dan Carpenter
The ql_wait_for_drvr_lock() fails and returns false, then this function should return an error code instead of returning success. The other problem is that the success path prints an error message netdev_err(ndev, "Releasing driver lock\n"); Delete that and re-order the code a little to make it more clear. Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211207082416.GA16110@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07Merge tag 'linux-can-fixes-for-5.16-20211207' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== can 2021-12-07 The 1st patch is by Vincent Mailhol and fixes a use after free in the pch_can driver. Dan Carpenter fixes a use after free in the ems_pcmcia sja1000 driver. The remaining 7 patches target the m_can driver. Brian Silverman contributes a patch to disable and ignore the ELO interrupt, which is currently not handled in the driver and may lead to an interrupt storm. Vincent Mailhol's patch fixes a memory leak in the error path of the m_can_read_fifo() function. The remaining patches are contributed by Matthias Schiffer, first a iomap_read_fifo() and iomap_write_fifo() functions are fixed in the PCI glue driver, then the clock rate for the Intel Ekhart Lake platform is fixed, the last 3 patches add support for the custom bit timings on the Elkhart Lake platform. * tag 'linux-can-fixes-for-5.16-20211207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: m_can: pci: use custom bit timings for Elkhart Lake can: m_can: make custom bittiming fields const Revert "can: m_can: remove support for custom bit timing" can: m_can: pci: fix incorrect reference clock rate can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo() can: m_can: m_can_read_fifo: fix memory leak in error branch can: m_can: Disable and ignore ELO interrupt can: sja1000: fix use after free in ems_pcmcia_add_card() can: pch_can: pch_can_rx_normal: fix use after free ==================== Link: https://lore.kernel.org/r/20211207102420.120131-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-07Merge tag 'platform-drivers-x86-v5.16-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "Various bug-fixes and hardware-id additions" * tag 'platform-drivers-x86-v5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86/intel: hid: add quirk to support Surface Go 3 platform/x86: amd-pmc: Fix s2idle failures on certain AMD laptops platform/x86: touchscreen_dmi: Add TrekStor SurfTab duo W1 touchscreen info platform/x86: lg-laptop: Recognize more models platform/x86: thinkpad_acpi: Add lid_logo_dot to the list of safe LEDs platform/x86: thinkpad_acpi: Restore missing hotkey_tablet_mode and hotkey_radio_sw sysfs-attr
2021-12-07netfs: fix parameter of cleanup()Jeffle Xu
The order of these two parameters is just reversed. gcc didn't warn on that, probably because 'void *' can be converted from or to other pointer types without warning. Cc: stable@vger.kernel.org Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers") Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
2021-12-07netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lockDavid Howells
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218fdc22c ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> cc: Jan Kara <jack@suse.cz> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
2021-12-07can: m_can: pci: use custom bit timings for Elkhart LakeMatthias Schiffer
The relevant datasheet [1] specifies nonstandard limits for the bit timing parameters. While it is unclear what the exact effect of violating these limits is, it seems like a good idea to adhere to the documentation. [1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J Series Processors for IoT Applications Datasheet, Volume 2 (Book 3 of 3), July 2021, Revision 001 Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Link: https://lore.kernel.org/all/9eba5d7c05a48ead4024ffa6e5926f191d8c6b38.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: m_can: make custom bittiming fields constMatthias Schiffer
The assigned timing structs will be defined a const anyway, so we can avoid a few casts by declaring the struct fields as const as well. Link: https://lore.kernel.org/all/4508fa4e639164b2584c49a065d90c78a91fa568.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07Revert "can: m_can: remove support for custom bit timing"Matthias Schiffer
The timing limits specified by the Elkhart Lake CPU datasheets do not match the defaults. Let's reintroduce the support for custom bit timings. This reverts commit 0ddd83fbebbc5537f9d180d31f659db3564be708. Link: https://lore.kernel.org/all/00c9e2596b1a548906921a574d4ef7a03c0dace0.1636967198.git.matthias.schiffer@ew.tq-group.com Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: m_can: pci: fix incorrect reference clock rateMatthias Schiffer
When testing the CAN controller on our Ekhart Lake hardware, we determined that all communication was running with twice the configured bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed this. Intel's support has confirmed to us that 200MHz is indeed the correct clock rate. Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake") Link: https://lore.kernel.org/all/c9cf3995f45c363e432b3ae8eb1275e54f009fc8.1636967198.git.matthias.schiffer@ew.tq-group.com Cc: stable@vger.kernel.org Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Reviewed-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: m_can: pci: fix iomap_read_fifo() and iomap_write_fifo()Matthias Schiffer
The same fix that was previously done in m_can_platform in commit 99d173fbe894 ("can: m_can: fix iomap_read_fifo() and iomap_write_fifo()") is required in m_can_pci as well to make iomap_read_fifo() and iomap_write_fifo() work for val_count > 1. Fixes: 812270e5445b ("can: m_can: Batch FIFO writes during CAN transmit") Fixes: 1aa6772f64b4 ("can: m_can: Batch FIFO reads during CAN receive") Link: https://lore.kernel.org/all/20211118144011.10921-1-matthias.schiffer@ew.tq-group.com Cc: stable@vger.kernel.org Cc: Matt Kline <matt@bitbashing.io> Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Tested-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: m_can: m_can_read_fifo: fix memory leak in error branchVincent Mailhol
In m_can_read_fifo(), if the second call to m_can_fifo_read() fails, the function jump to the out_fail label and returns without calling m_can_receive_skb(). This means that the skb previously allocated by alloc_can_skb() is not freed. In other terms, this is a memory leak. This patch adds a goto label to destroy the skb if an error occurs. Issue was found with GCC -fanalyzer, please follow the link below for details. Fixes: e39381770ec9 ("can: m_can: Disable IRQs on FIFO bus errors") Link: https://lore.kernel.org/all/20211107050755.70655-1-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Cc: Matt Kline <matt@bitbashing.io> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: m_can: Disable and ignore ELO interruptBrian Silverman
With the design of this driver, this condition is often triggered. However, the counter that this interrupt indicates an overflow is never read either, so overflowing is harmless. On my system, when a CAN bus starts flapping up and down, this locks up the whole system with lots of interrupts and printks. Specifically, this interrupt indicates the CEL field of ECR has overflowed. All reads of ECR mask out CEL. Fixes: e0d1f4816f2a ("can: m_can: add Bosch M_CAN controller support") Link: https://lore.kernel.org/all/20211129222628.7490-1-brian.silverman@bluerivertech.com Cc: stable@vger.kernel.org Signed-off-by: Brian Silverman <brian.silverman@bluerivertech.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: sja1000: fix use after free in ems_pcmcia_add_card()Dan Carpenter
If the last channel is not available then "dev" is freed. Fortunately, we can just use "pdev->irq" instead. Also we should check if at least one channel was set up. Fixes: fd734c6f25ae ("can/sja1000: add driver for EMS PCMCIA card") Link: https://lore.kernel.org/all/20211124145041.GB13656@kili Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-07can: pch_can: pch_can_rx_normal: fix use after freeVincent Mailhol
After calling netif_receive_skb(skb), dereferencing skb is unsafe. Especially, the can_frame cf which aliases skb memory is dereferenced just after the call netif_receive_skb(skb). Reordering the lines solves the issue. Fixes: b21d18b51b31 ("can: Topcliff: Add PCH_CAN driver.") Link: https://lore.kernel.org/all/20211123111654.621610-1-mailhol.vincent@wanadoo.fr Cc: stable@vger.kernel.org Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-12-06Merge branch 'samples: bpf: fix build issues with Clang/LLVM'Andrii Nakryiko
Alexander Lobakin says: ==================== Samples, at least XDP ones, can be built only with the compiler used to build the kernel itself. However, XDP sample infra introduced in Aug'21 was probably tested with GCC/Binutils only as it's not really compilable for now with Clang/LLVM. These two are trivial fixes addressing this. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-12-06samples: bpf: Fix 'unknown warning group' build warning on ClangAlexander Lobakin
Clang doesn't have 'stringop-truncation' group like GCC does, and complains about it when building samples which use xdp_sample_user infra: samples/bpf/xdp_sample_user.h:48:32: warning: unknown warning group '-Wstringop-truncation', ignored [-Wunknown-warning-option] #pragma GCC diagnostic ignored "-Wstringop-truncation" ^ [ repeat ] Those are harmless, but avoidable when guarding it with ifdef. I could guard push/pop as well, but this would require one more ifdef cruft around a single line which I don't think is reasonable. Fixes: 156f886cf697 ("samples: bpf: Add basic infrastructure for XDP samples") Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20211203195004.5803-3-alexandr.lobakin@intel.com