linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-09-28	net: ipv6: check return value of rhashtable_init	MichelleJin
	When rhashtable_init() fails, it returns -EINVAL. However, since error return value of rhashtable_init is not checked, it can cause use of uninitialized pointers. So, fix unhandled errors of rhashtable_init. Signed-off-by: MichelleJin <shjy180909@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-28	netfilter: nf_tables: reverse order in rule replacement expansion	Pablo Neira Ayuso
	Deactivate old rule first, then append the new rule, so rule replacement notification via netlink first reports the deletion of the old rule with handle X in first place, then it adds the new rule (reusing the handle X of the replaced old rule). Note that the abort path releases the transaction that has been created by nft_delrule() on error. Fixes: ca08987885a1 ("netfilter: nf_tables: deactivate expressions in rule replecement routine") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-28	netfilter: nf_tables: add position handle in event notification	Pablo Neira Ayuso
	Add position handle to allow to identify the rule location from netlink events. Otherwise, userspace cannot incrementally update a userspace cache through monitoring events. Skip handle dump if the rule has been either inserted (at the beginning of the ruleset) or appended (at the end of the ruleset), the NLM_F_APPEND netlink flag is sufficient in these two cases. Handle NLM_F_REPLACE as NLM_F_APPEND since the rule replacement expansion appends it after the specified rule handle. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-28	netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1	Florian Westphal
	This is a revert of 7b1957b049 ("netfilter: nf_defrag_ipv4: use net_generic infra") and a partial revert of 8b0adbe3e3 ("netfilter: nf_defrag_ipv6: use net_generic infra"). If conntrack is builtin and kernel is booted with: nf_conntrack.enable_hooks=1 .... kernel will fail to boot due to a NULL deref in nf_defrag_ipv4_enable(): Its called before the ipv4 defrag initcall is made, so net_generic() returns NULL. To resolve this, move the user refcount back to struct net so calls to those functions are possible even before their initcalls have run. Fixes: 7b1957b04956 ("netfilter: nf_defrag_ipv4: use net_generic infra") Fixes: 8b0adbe3e38d ("netfilter: nf_defrag_ipv6: use net_generic infra"). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-09-28	Bluetooth: Fix wrong opcode when LL privacy enabled	Yun-Hao Chung
	The returned opcode of command status of remove_adv is wrong when LL privacy is enabled. Signed-off-by: Yun-Hao Chung <howardchung@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-09-28	Bluetooth: Fix Advertisement Monitor Suspend/Resume	Manish Mandlik
	During system suspend, advertisement monitoring is disabled by setting the HCI_VS_MSFT_LE_Set_Advertisement_Filter_Enable to False. This disables the monitoring during suspend, however, if the controller is monitoring a device, it sends HCI_VS_MSFT_LE_Monitor_Device_Event to indicate that the monitoring has been stopped for that particular device. This event may occur after suspend depending on the low_threshold_timeout and peer device advertisement frequency, which causes early wake up. Right way to disable the monitoring for suspend is by removing all the monitors before suspend and re-monitor after resume to ensure no events are received during suspend. This patch fixes this suspend/resume issue. Following tests are performed: - Add monitors before suspend and make sure DeviceFound gets triggered - Suspend the system and verify that all monitors are removed by kernel but not Released by bluetoothd - Wake up and verify that all monitors are added again and DeviceFound gets triggered Signed-off-by: Manish Mandlik <mmandlik@google.com> Reviewed-by: Archie Pusaka <apusaka@google.com> Reviewed-by: Miao-chen Chou <mcchou@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-09-28	bpf, test, cgroup: Use sk_{alloc,free} for test cases	Daniel Borkmann
	BPF test infra has some hacks in place which kzalloc() a socket and perform minimum init via sock_net_set() and sock_init_data(). As a result, the sk's skcd->cgroup is NULL since it didn't go through proper initialization as it would have been the case from sk_alloc(). Rather than re-adding a NULL test in sock_cgroup_ptr() just for this, use sk_{alloc,free}() pair for the test socket. The latter also allows to get rid of the bpf_sk_storage_free() special case. Fixes: 8520e224f547 ("bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode") Fixes: b7a1848e8398 ("bpf: add BPF_PROG_TEST_RUN support for flow dissector") Fixes: 2cb494a36c98 ("bpf: add tests for direct packet access from CGROUP_SKB") Reported-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Reported-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot+664b58e9a40fbb2cec71@syzkaller.appspotmail.com Tested-by: syzbot+33f36d0754d4c5c0e102@syzkaller.appspotmail.com Link: https://lore.kernel.org/bpf/20210927123921.21535-2-daniel@iogearbox.net
2021-09-28	xsk: Optimize for aligned case	Magnus Karlsson
	Optimize for the aligned case by precomputing the parameter values of the xdp_buff_xsk and xdp_buff structures in the heads array. We can do this as the heads array size is equal to the number of chunks in the umem for the aligned case. Then every entry in this array will reflect a certain chunk/frame and can therefore be prepopulated with the correct values and we can drop the use of the free_heads stack. Note that it is not possible to allocate more buffers than what has been allocated in the aligned case since each chunk can only contain a single buffer. We can unfortunately not do this in the unaligned case as one chunk might contain multiple buffers. In this case, we keep the old scheme of populating a heads entry every time it is used and using the free_heads stack. Also move xp_release() and xp_get_handle() to xsk_buff_pool.h. They were for some reason in xsk.c even though they are buffer pool operations. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-7-magnus.karlsson@gmail.com
2021-09-28	xsk: Batched buffer allocation for the pool	Magnus Karlsson
	Add a new driver interface xsk_buff_alloc_batch() offering batched buffer allocations to improve performance. The new interface takes three arguments: the buffer pool to allocated from, a pointer to an array of struct xdp_buff pointers which will contain pointers to the allocated xdp_buffs, and an unsigned integer specifying the max number of buffers to allocate. The return value is the actual number of buffers that the allocator managed to allocate and it will be in the range 0 <= N <= max, where max is the third parameter to the function. u32 xsk_buff_alloc_batch(struct xsk_buff_pool pool, struct xdp_buff xdp, u32 max); A second driver interface is also introduced that need to be used in conjunction with xsk_buff_alloc_batch(). It is a helper that sets the size of struct xdp_buff and is used by the NIC Rx irq routine when receiving a packet. This helper sets the three struct members data, data_meta, and data_end. The two first ones is in the xsk_buff_alloc() case set in the allocation routine and data_end is set when a packet is received in the receive irq function. This unfortunately leads to worse performance since the xdp_buff is touched twice with a long time period in between leading to an extra cache miss. Instead, we fill out the xdp_buff with all 3 fields at one single point in time in the driver, when the size of the packet is known. Hence this helper. Note that the driver has to use this helper (or set all three fields itself) when using xsk_buff_alloc_batch(). xsk_buff_alloc() works as before and does not require this. void xsk_buff_set_size(struct xdp_buff xdp, u32 size); Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210922075613.12186-3-magnus.karlsson@gmail.com
2021-09-27	net: dsa: Move devlink registration to be last devlink command	Leon Romanovsky
	This change prevents from users to access device before devlink is fully configured. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	devlink: Notify users when objects are accessible	Leon Romanovsky
	The devlink core code notified users about add/remove objects without relation if this object can be accessible or not. In this patch we unify such user visible notifications in one place. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	Merge 5.15-rc3 into tty-next	Greg Kroah-Hartman
	We need the tty fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-27	nl80211: MBSSID and EMA support in AP mode	John Crispin
	Add new attributes to configure support for multiple BSSID and advanced multi-BSSID advertisements (EMA) in AP mode. - NL80211_ATTR_MBSSID_CONFIG used for per interface configuration. - NL80211_ATTR_MBSSID_ELEMS used to MBSSID elements for beacons. Memory for the elements is allocated dynamically. This change frees the memory in existing functions which call nl80211_parse_beacon(), a comment is added to indicate the new references to do the same. Signed-off-by: John Crispin <john@phrozen.org> Co-developed-by: Aloka Dixit <alokad@codeaurora.org> Signed-off-by: Aloka Dixit <alokad@codeaurora.org> Link: https://lore.kernel.org/r/20210916025437.29138-2-alokad@codeaurora.org [don't leave ERR_PTR hanging around] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	nl80211: don't kfree() ERR_PTR() value	Johannes Berg
	When parse_acl_data() fails, we get an ERR_PTR() value and then "goto out;", in this case we'd attempt to kfree() it. Fix that. Fixes: 9e263e193af7 ("nl80211: don't put struct cfg80211_ap_settings on stack") Link: https://lore.kernel.org/r/20210927134402.86c5ae06c952.Ic51e234d998b9045665e5ff8b6db7e29f25d70c0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	Merge tag 'mac80211-for-net-2021-09-27' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes berg says: ==================== Some fixes: * potential use-after-free in CCMP/GCMP RX processing * potential use-after-free in TX A-MSDU processing * revert to low data rates for no-ack as the commit broke other things * limit VHT MCS/NSS in radiotap injection * drop frames with invalid addresses in IBSS mode * check rhashtable_init() return value in mesh * fix potentially unaligned access in mesh * fix late beacon hrtimer handling in hwsim (syzbot) * fix documentation for PTK0 rekeying ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	net/ipv4/tcp_nv.c: remove superfluous header files from tcp_nv.c	Mianhan Liu
	tcp_nv.c hasn't use any macro or function declared in mm.h. Thus, these files can be removed from tcp_nv.c safely without affecting the compilation of the net module. Signed-off-by: Mianhan Liu <liumh1@shanghaitech.edu.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	cfg80211: always free wiphy specific regdomain	Johannes Berg
	In the (somewhat unlikely) event that we allocate a wiphy, then add a regdomain to it, and then fail registration, we leak the regdomain. Fix this by just always freeing it at the end, in the normal cases we'll free (and NULL) it during wiphy_unregister(). This happened when the wiphy settings were bad, and since they can be controlled by userspace with hwsim, syzbot was able to find this issue. Reported-by: syzbot+1638e7c770eef6b6c0d0@syzkaller.appspotmail.com Fixes: 3e0c3ff36c4c ("cfg80211: allow multiple driver regulatory_hints()") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210927131105.68b70cef4674.I4b9f0aa08c2af28555963b9fe3d34395bb72e0cc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	mac80211: save transmit power envelope element and power constraint	Wen Gong
	This is to save the transmit power envelope element and power constraint in struct ieee80211_bss_conf for 6 GHz. Lower driver will use this info to calculate the power limit. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210924100052.32029-7-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	net: make napi_disable() symmetric with enable	Jakub Kicinski
	Commit 3765996e4f0b ("napi: fix race inside napi_enable") fixed an ordering bug in napi_enable() and made the napi_enable() diverge from napi_disable(). The state transitions done on disable are not symmetric to enable. There is no known bug in napi_disable() this is just refactoring. Eric suggests we can also replace msleep(1) with a more opportunistic usleep_range(). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-27	mac80211: add parse regulatory info in 6 GHz operation information	Wen Gong
	This patch is to convert the regulatory info subfield in HE operation element to power type and save in struct cfg80211_chan_def. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210924100052.32029-3-wgong@codeaurora.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	mac80211: twt: don't use potentially unaligned pointer	Johannes Berg
	Since we're pointing into a frame, the pointer to the twt_agrt->req_type struct member is potentially not aligned properly. Open-code le16p_replace_bits() to avoid passing an unaligned pointer. Reported-by: kernel test robot <lkp@intel.com> Fixes: f5a4c24e689f ("mac80211: introduce individual TWT support in AP mode") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20210927115124.e1208694f37b.Ie3de9bcc5dde5a79e3ac81f3185beafe4d214e57@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	cfg80211: AP mode driver offload for FILS association crypto	Subrat Mishra
	Add a driver FILS crypto offload extended capability flag to indicate that the driver running in AP mode is capable of handling encryption and decryption of (Re)Association request and response frames. Add a command to set FILS AAD data to driver. This feature is supported on drivers running in AP mode only. This extended capability is exchanged with hostapd during cfg80211 init. If the driver indicates this capability, then before sending the Authentication response frame, hostapd sets FILS AAD data to the driver. This allows the driver to decrypt (Re)Association Request frame and encrypt (Re)Association Response frame. FILS Key derivation will still be done in hostapd. Signed-off-by: Subrat Mishra <subratm@codeaurora.org> Link: https://lore.kernel.org/r/1631685143-13530-1-git-send-email-subratm@codeaurora.org [fix whitespace] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	mac80211: check return value of rhashtable_init	MichelleJin
	When rhashtable_init() fails, it returns -EINVAL. However, since error return value of rhashtable_init is not checked, it can cause use of uninitialized pointers. So, fix unhandled errors of rhashtable_init. Signed-off-by: MichelleJin <shjy180909@gmail.com> Link: https://lore.kernel.org/r/20210927033457.1020967-4-shjy180909@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-27	mac80211: fix use-after-free in CCMP/GCMP RX	Johannes Berg
	When PN checking is done in mac80211, for fragmentation we need to copy the PN to the RX struct so we can later use it to do a comparison, since commit bf30ca922a0c ("mac80211: check defrag PN against current frame"). Unfortunately, in that commit I used the 'hdr' variable without it being necessarily valid, so use-after-free could occur if it was necessary to reallocate (parts of) the frame. Fix this by reloading the variable after the code that results in the reallocations, if any. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=214401. Cc: stable@vger.kernel.org Fixes: bf30ca922a0c ("mac80211: check defrag PN against current frame") Link: https://lore.kernel.org/r/20210927115838.12b9ac6bb233.I1d066acd5408a662c3b6e828122cd314fcb28cdb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-26	net: prevent user from passing illegal stab size	王贇
	We observed below report when playing with netlink sock: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:580:10 shift exponent 249 is too large for 32-bit type CPU: 0 PID: 685 Comm: a.out Not tainted Call Trace: dump_stack_lvl+0x8d/0xcf ubsan_epilogue+0xa/0x4e __ubsan_handle_shift_out_of_bounds+0x161/0x182 __qdisc_calculate_pkt_len+0xf0/0x190 __dev_queue_xmit+0x2ed/0x15b0 it seems like kernel won't check the stab log value passing from user, and will use the insane value later to calculate pkt_len. This patch just add a check on the size/cell_log to avoid insane calculation. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: re-arm retransmit timer if data is pending	Florian Westphal
	The retransmit head will be NULL in case there is no in-flight data (meaning all data injected into network has been acked). In that case the retransmit timer is stopped. This is only correct if there is no more pending, not-yet-sent data. If there is, the retransmit timer needs to set the PENDING bit again so that mptcp tries to send the remaining (new) data once a subflow can accept more data. Also, mptcp_subflow_get_retrans() has to be called unconditionally. This function checks for subflows that have become unresponsive and marks them as stale, so in the case where the rtx queue is empty, subflows will never be marked stale which prevents available backup subflows from becoming eligible for transmit. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/226 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: remove tx_pending_data	Florian Westphal
	The update on recovery is not correct. msk->tx_pending_data += msk->snd_nxt - rtx_head->data_seq; will update tx_pending_data multiple times when a subflow is declared stale while earlier recovery is still in progress. This means that tx_pending_data will still be positive even after all data as has been transmitted. Rather than fix it, remove this field: there are no consumers. The outstanding data byte count can be computed either via "msk->write_seq - rtx_head->data_seq" or "msk->write_seq - msk->snd_una". The latter is more recent/accurate estimate as rtx_head adjustment is deferred until mptcp lock can be acquired. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: use lockdep_assert_held_once() instead of open-coding it	Paolo Abeni
	We have a few more places where the mptcp code duplicates lockdep_assert_held_once(). Let's use the existing macro and avoid a bunch of compiler's conditional. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: use OPTIONS_MPTCP_MPC	Geliang Tang
	Since OPTIONS_MPTCP_MPC has been defined, use it instead of open-coding. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliangtang@xiaomi.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-25	mptcp: do not shrink snd_nxt when recovering	Florian Westphal
	When recovering after a link failure, snd_nxt should not be set to a lower value. Else, update of snd_nxt is broken because: msk->snd_nxt += ret; (where ret is number of bytes sent) assumes that snd_nxt always moves forward. After reduction, its possible that snd_nxt update gets out of sync: dfrag we just sent might have had a data sequence number even past recovery_snd_nxt. This change factors the common msk state update to a helper and updates snd_nxt based on the current dfrag data sequence number. The conditional is required for the recovery phase where we may re-transmit old dfrags that are before current snd_nxt. After this change, snd_nxt only moves forward and covers all in-sequence data that was transmitted. recovery_snd_nxt is retained to detect when recovery has completed. Fixes: 1e1d9d6f119c5 ("mptcp: handle pending data on closed subflow") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf	Jakub Kicinski
	Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net 1) ipset limits the max allocatable memory via kvmalloc() to MAX_INT, from Jozsef Kadlecsik. 2) Check ip_vs_conn_tab_bits value to be in the range specified in Kconfig, from Andrea Claudi. 3) Initialize fragment offset in ip6tables, from Jeremy Sowden. 4) Make conntrack hash chain length random, from Florian Westphal. 5) Add zone ID to conntrack and NAT hashtuple again, also from Florian. 6) Add selftests for bidirectional zone support and colliding tuples, from Florian Westphal. 7) Unlink table before synchronize_rcu when cleaning tables with owner, from Florian. 8) ipset limits the max allocatable memory via kvmalloc() to MAX_INT. 9) Release conntrack entries via workqueue in masquerade, from Florian. 10) Fix bogus net_init in iptables raw table definition, also from Florian. 11) Work around missing softdep in log extensions, from Florian Westphal. 12) Serialize hash resizes and cleanups with mutex, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: conntrack: serialize hash resizes and cleanups netfilter: log: work around missing softdep backend module netfilter: iptable_raw: drop bogus net_init annotation netfilter: nf_nat_masquerade: defer conntrack walk to work queue netfilter: nf_nat_masquerade: make async masq_inet6_event handling generic netfilter: nf_tables: Fix oversized kvmalloc() calls netfilter: nf_tables: unlink table before deleting it selftests: netfilter: add zone stress test with colliding tuples selftests: netfilter: add selftest for directional zone support netfilter: nat: include zone id in nat table hash again netfilter: conntrack: include zone id in tuple hash again netfilter: conntrack: make max chain length random netfilter: ip6_tables: zero-initialize fragment offset ipvs: check that ip_vs_conn_tab_bits is between 8 and 20 netfilter: ipset: Fix oversized kvmalloc() calls ==================== Link: https://lore.kernel.org/r/20210924221113.348767-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-24	rxrpc: Fix _usecs_to_jiffies() by using usecs_to_jiffies()	Jiasheng Jiang
	Directly using _usecs_to_jiffies() might be unsafe, so it's better to use usecs_to_jiffies() instead. Because we can see that the result of _usecs_to_jiffies() could be larger than MAX_JIFFY_OFFSET values without the check of the input. Fixes: c410bf01933e ("Fix the excessive initial retransmission timeout") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	tcp: tracking packets with CE marks in BW rate sample	Yuchung Cheng
	In order to track CE marks per rate sample (one round trip), TCP needs a per-skb header field to record the tp->delivered_ce count when the skb was sent. To make space, we replace the "last_in_flight" field which is used exclusively for NV congestion control. The stat needed by NV can be alternatively approximated by existing stats tcp_sock delivered and mss_cache. This patch counts the number of packets delivered which have CE marks in the rate sample, using similar approach of delivery accounting. Cc: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Luke Hsiao <lukehsiao@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	devlink: Remove single line function obfuscations	Leon Romanovsky
	There is no need in extra one line functions to call relevant functions only once. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	devlink: Delete not used port parameters APIs	Leon Romanovsky
	There is no in-kernel users for the devlink port parameters API, so let's remove it. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	net: ipv4: Fix rtnexthop len when RTA_FLOW is present	Xiao Liang
	Multipath RTA_FLOW is embedded in nexthop. Dump it in fib_add_nexthop() to get the length of rtnexthop correct. Fixes: b0f60193632e ("ipv4: Refactor nexthop attributes in fib_dump_info") Signed-off-by: Xiao Liang <shaw.leon@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	mac80211: use ieee802_11_parse_elems() in ieee80211_prep_channel()	Wen Gong
	In function ieee80211_prep_channel(), it has some ieee80211_bss_get_ie() and cfg80211_find_ext_ie() to get the IE, this is to use another function ieee802_11_parse_elems() to get all the IEs in one time. Signed-off-by: Wen Gong <wgong@codeaurora.org> Link: https://lore.kernel.org/r/20210924100052.32029-6-wgong@codeaurora.org [remove now unnecessary size validation, use -ENOMEM, free elems earlier for less error handling code] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-24	mptcp: allow changing the 'backup' bit when no sockets are open	Davide Caratti
	current Linux refuses to change the 'backup' bit of MPTCP endpoints, i.e. using MPTCP_PM_CMD_SET_FLAGS, unless it finds (at least) one subflow that matches the endpoint address. There is no reason for that, so we can just ignore the return value of mptcp_nl_addr_backup(). In this way, endpoints can reconfigure their 'backup' flag even if no MPTCP sockets are open (or more generally, in case the MP_PRIO message is not sent out). Fixes: 0f9f696a502e ("mptcp: add set_flags command in PM netlink") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	mptcp: don't return sockets in foreign netns	Florian Westphal
	mptcp_token_get_sock() may return a mptcp socket that is in a different net namespace than the socket that received the token value. The mptcp syncookie code path had an explicit check for this, this moves the test into mptcp_token_get_sock() function. Eventually token.c should be converted to pernet storage, but such change is not suitable for net tree. Fixes: 2c5ebd001d4f0 ("mptcp: refactor token container") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-24	sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb	Xin Long
	We should always check if skb_header_pointer's return is NULL before using it, otherwise it may cause null-ptr-deref, as syzbot reported: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: 0010:sctp_rcv_ootb net/sctp/input.c:705 [inline] RIP: 0010:sctp_rcv+0x1d84/0x3220 net/sctp/input.c:196 Call Trace: <IRQ> sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1109 ip6_protocol_deliver_rcu+0x2e9/0x1ca0 net/ipv6/ip6_input.c:422 ip6_input_finish+0x62/0x170 net/ipv6/ip6_input.c:463 NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472 dst_input include/net/dst.h:460 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline] NF_HOOK include/linux/netfilter.h:307 [inline] NF_HOOK include/linux/netfilter.h:301 [inline] ipv6_rcv+0x28c/0x3c0 net/ipv6/ip6_input.c:297 Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize") Reported-by: syzbot+581aff2ae6b860625116@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23	UNRPC: Return specific error code on kmalloc failure	Yang Li
	Although the callers of this function only care about whether the return value is null or not, we should still give a rigorous error code. Smatch tool warning: net/sunrpc/auth_gss/svcauth_gss.c:784 gss_write_verf() warn: returning -1 instead of -ENOMEM is sloppy No functional change, just more standardized. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-09-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	net/mptcp/protocol.c 977d293e23b4 ("mptcp: ensure tx skbs always have the MPTCP ext") efe686ffce01 ("mptcp: ensure tx skbs always have the MPTCP ext") same patch merged in both trees, keep net-next. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-23	nl80211: don't put struct cfg80211_ap_settings on stack	Johannes Berg
	This struct has grown quite a bit, so dynamically allocate it instead of putting it on the stack. Link: https://lore.kernel.org/r/20210923161836.5813d881eae3.I0fc0f83905b0bfa332c4f1505e00c13abfca3545@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	mac80211: always allocate struct ieee802_11_elems	Johannes Berg
	As the 802.11 spec evolves, we need to parse more and more elements. This is causing the struct to grow, and we can no longer get away with putting it on the stack. Change the API to always dynamically allocate and return an allocated pointer that must be kfree()d later. As an alternative, I contemplated a scheme whereby we'd say in the code which elements we needed, e.g. DECLARE_ELEMENT_PARSER(elems, SUPPORTED_CHANNELS, CHANNEL_SWITCH, EXT(KEY_DELIVERY)); ieee802_11_parse_elems(..., &elems, ...); and while I think this is possible and will save us a lot since most individual places only care about a small subset of the elements, it ended up being a bit more work since a lot of places do the parsing and then pass the struct to other functions, sometimes with multiple levels. Link: https://lore.kernel.org/r/20210920154009.26caff6b5998.I05ae58768e990e611aee8eca8abefd9d7bc15e05@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	mac80211: mlme: find auth challenge directly	Johannes Berg
	There's no need to parse all elements etc. just to find the authentication challenge - use cfg80211_find_elem() instead. This also allows us to remove WLAN_EID_CHALLENGE handling from the element parsing entirely. Link: https://lore.kernel.org/r/20210920154009.45f9b3a15722.Ice3159ffad03a007d6154cbf1fb3a8c48489e86f@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	mac80211: move CRC into struct ieee802_11_elems	Johannes Berg
	We're currently returning this value, but to prepare for returning the allocated structure, move it into there. Link: https://lore.kernel.org/r/20210920154009.479b8ebf999d.If0d4ba75ee38998dc3eeae25058aa748efcb2fc9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	mac80211: mesh: clean up rx_bcn_presp API	Johannes Berg
	We currently pass the entire elements to the rx_bcn_presp() method, but only need mesh_config. Additionally, we use the length of the elements to calculate back the entire frame's length, but that's confusing - just pass the length of the frame instead. Link: https://lore.kernel.org/r/20210920154009.a18ed3d2da6c.I1824b773a0fbae4453e1433c184678ca14e8df45@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	mac80211: reduce stack usage in debugfs	Johannes Berg
	We put a few large buffers on the stack here, but it's easy to just allocate them on the heap, so do that. Link: https://lore.kernel.org/r/20210920154009.1387f44e7382.Ife043c169e6a44edace516fea9f8311a5ca4282a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2021-09-23	tcp: remove sk_{tr}x_skb_cache	Eric Dumazet
	This reverts the following patches : - commit 2e05fcae83c4 ("tcp: fix compile error if !CONFIG_SYSCTL") - commit 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") - commit 472c2e07eef0 ("tcp: add one skb cache for tx") - commit 8b27dae5a2e8 ("tcp: add one skb cache for rx") Having a cache of one skb (in each direction) per TCP socket is fragile, since it can cause a significant increase of memory needs, and not good enough for high speed flows anyway where more than one skb is needed. We want instead to add a generic infrastructure, with more flexible per-cpu caches, for alien NUMA nodes. Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-23	tcp: make tcp_build_frag() static	Paolo Abeni
	After the previous patch the mentioned helper is used only inside its compilation unit: let's make it static. RFC -> v1: - preserve the tcp_build_frag() helper (Eric) Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>