summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-18net: stmmac: No need to calculate speed divider when offload is disabledXiaolei Wang
commit be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") introduced a problem. When deleting, it prompts "Invalid portTransmitRate 0 (idleSlope - sendSlope)" and exits. Add judgment on cbs.enable. Only when offload is enabled, speed divider needs to be calculated. Fixes: be27b8965297 ("net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters") Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240617013922.1035854-1-xiaolei.wang@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net: phy: dp83tg720: get master/slave configuration in link down stateOleksij Rempel
Get master/slave configuration for initial system start with the link in down state. This ensures ethtool shows current configuration. Also fixes link reconfiguration with ethtool while in down state, preventing ethtool from displaying outdated configuration. Even though dp83tg720_config_init() is executed periodically as long as the link is in admin up state but no carrier is detected, this is not sufficient for the link in admin down state where dp83tg720_read_status() is not periodically executed. To cover this case, we need an extra read role configuration in dp83tg720_config_aneg(). Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Cc: stable@vger.kernel.org Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240614094516.1481231-2-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18net: phy: dp83tg720: wake up PHYs in managed modeOleksij Rempel
In case this PHY is bootstrapped for managed mode, we need to manually wake it. Otherwise no link will be detected. Cc: stable@vger.kernel.org Fixes: cb80ee2f9bee1 ("net: phy: Add support for the DP83TG720S Ethernet PHY") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20240614094516.1481231-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18Merge tag 'linux_kselftest-fixes-6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: - filesystems: warn_unused_result warnings - seccomp: format-zero-length warnings - fchmodat2: clang build warnings due to-static-libasan - openat2: clang build warnings due to static-libasan, LOCAL_HDRS * tag 'linux_kselftest-fixes-6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/fchmodat2: fix clang build failure due to -static-libasan selftests/openat2: fix clang build failures: -static-libasan, LOCAL_HDRS selftests: seccomp: fix format-zero-length warnings selftests: filesystems: fix warn_unused_result build warnings
2024-06-18selftests: openvswitch: Use bash as interpreterSimon Horman
openvswitch.sh makes use of substitutions of the form ${ns:0:1}, to obtain the first character of $ns. Empirically, this is works with bash but not dash. When run with dash these evaluate to an empty string and printing an error to stdout. # dash -c 'ns=client; echo "${ns:0:1}"' 2>error # cat error dash: 1: Bad substitution # bash -c 'ns=client; echo "${ns:0:1}"' 2>error c # cat error This leads to tests that neither pass nor fail. F.e. TEST: arp_ping [START] adding sandbox 'test_arp_ping' Adding DP/Bridge IF: sbx:test_arp_ping dp:arpping {, , } create namespaces ./openvswitch.sh: 282: eval: Bad substitution TEST: ct_connect_v4 [START] adding sandbox 'test_ct_connect_v4' Adding DP/Bridge IF: sbx:test_ct_connect_v4 dp:ct4 {, , } ./openvswitch.sh: 322: eval: Bad substitution create namespaces Resolve this by making openvswitch.sh a bash script. Fixes: 918423fda910 ("selftests: openvswitch: add an initial flow programming case") Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240617-ovs-selftest-bash-v1-1-7ae6ccd3617b@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18ptp: fix integer overflow in max_vclocks_storeDan Carpenter
On 32bit systems, the "4 * max" multiply can overflow. Use kcalloc() to do the allocation to prevent this. Fixes: 44c494c8e30e ("ptp: track available ptp vclocks information") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Link: https://lore.kernel.org/r/ee8110ed-6619-4bd7-9024-28c1f2ac24f4@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-18cpumask: limit FORCE_NR_CPUS to just the UP caseLinus Torvalds
Hardcoding the number of CPUs at compile time does improve code generation, but if you get it wrong the result will be confusion. We already limited this earlier to only "experts" (see commit fe5759d5bfda "cpumask: limit visibility of FORCE_NR_CPUS"), but with distro kernel configs often having EXPERT enabled, that turns out to not be much of a limit. To quote the philosophers at Disney: "Everyone can be an expert. And when everyone's an expert, no one will be". There's a runtime warning if you then set nr_cpus to anything but the forced number, but apparently that can be ignored too [1] and by then it's pretty much too late anyway. If we had some real way to limit this to "embedded only", maybe it would be worth it, but let's see if anybody even notices that the option is gone. We need to simplify kernel configuration anyway. Link: https://lore.kernel.org/all/20240618105036.208a8860@rorschach.local.home/ [1] Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul McKenney <paulmck@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-18Merge tag 'efi-fixes-for-v6.10-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: "Another small set of EFI fixes. Only the x86 one is likely to affect any actual users (and has a cc:stable), but the issue it fixes was only observed in an unusual context (kexec in a confidential VM). - Ensure that EFI runtime services are not unmapped by PAN on ARM - Avoid freeing the memory holding the EFI memory map inadvertently on x86 - Avoid a false positive kmemleak warning on arm64" * tag 'efi-fixes-for-v6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/arm64: Fix kmemleak false positive in arm64_efi_rt_init() efi/x86: Free EFI memory map only when installing a new one. efi/arm: Disable LPAE PAN when calling EFI runtime services
2024-06-18net: microchip: Constify struct vcap_operationsChristophe JAILLET
"struct vcap_operations" are not modified in these drivers. Constifying this structure moves some data to a read-only section, so increase overall security. In order to do it, "struct vcap_control" also needs to be adjusted to this new const qualifier. As an example, on a x86_64, with allmodconfig: Before: ====== text data bss dec hex filename 15176 1094 16 16286 3f9e drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o After: ===== text data bss dec hex filename 15268 998 16 16282 3f9a drivers/net/ethernet/microchip/lan966x/lan966x_vcap_impl.o Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://lore.kernel.org/r/d8e76094d2e98ebb5bfc8205799b3a9db0b46220.1718524644.git.christophe.jaillet@wanadoo.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18sched: act_ct: add netns into the key of tcf_ct_flow_tableXin Long
zones_ht is a global hashtable for flow_table with zone as key. However, it does not consider netns when getting a flow_table from zones_ht in tcf_ct_init(), and it means an act_ct action in netns A may get a flow_table that belongs to netns B if it has the same zone value. In Shuang's test with the TOPO: tcf2_c <---> tcf2_sw1 <---> tcf2_sw2 <---> tcf2_s tcf2_sw1 and tcf2_sw2 saw the same flow and used the same flow table, which caused their ct entries entering unexpected states and the TCP connection not able to end normally. This patch fixes the issue simply by adding netns into the key of tcf_ct_flow_table so that an act_ct action gets a flow_table that belongs to its own netns in tcf_ct_init(). Note that for easy coding we don't use tcf_ct_flow_table.nf_ft.net, as the ct_ft is initialized after inserting it to the hashtable in tcf_ct_flow_table_get() and also it requires to implement several functions in rhashtable_params including hashfn, obj_hashfn and obj_cmpfn. Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/1db5b6cc6902c5fc6f8c6cbd85494a2008087be5.1718488050.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18tipc: force a dst refcount before doing decryptionXin Long
As it says in commit 3bc07321ccc2 ("xfrm: Force a dst refcount before entering the xfrm type handlers"): "Crypto requests might return asynchronous. In this case we leave the rcu protected region, so force a refcount on the skb's destination entry before we enter the xfrm type input/output handlers." On TIPC decryption path it has the same problem, and skb_dst_force() should be called before doing decryption to avoid a possible crash. Shuang reported this issue when this warning is triggered: [] WARNING: include/net/dst.h:337 tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Kdump: loaded Tainted: G W --------- - - 4.18.0-496.el8.x86_64+debug [] Workqueue: crypto cryptd_queue_worker [] RIP: 0010:tipc_sk_rcv+0x1055/0x1ea0 [tipc] [] Call Trace: [] tipc_sk_mcast_rcv+0x548/0xea0 [tipc] [] tipc_rcv+0xcf5/0x1060 [tipc] [] tipc_aead_decrypt_done+0x215/0x2e0 [tipc] [] cryptd_aead_crypt+0xdb/0x190 [] cryptd_queue_worker+0xed/0x190 [] process_one_work+0x93d/0x17e0 Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/fbe3195fad6997a4eec62d9bf076b2ad03ac336b.1718476040.git.lucien.xin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18Merge branch 'introduce-phy-mode-10g-qxgmii'Paolo Abeni
Luo Jie says: ==================== Introduce PHY mode 10G-QXGMII This patch series adds 10G-QXGMII mode for PHY driver. The patch series is split from the QCA8084 PHY driver patch series below. https://lore.kernel.org/all/20231215074005.26976-1-quic_luoj@quicinc.com/ Per Andrew Lunn’s advice, submitting this patch series for acceptance as they already include the necessary 'Reviewed-by:' tags. This way, they need not wait for QCA8084 series patches to conclude review. Changes in v2: * remove PHY_INTERFACE_MODE_10G_QXGMII from workaround of validation in the phylink_validate_phy. 10G_QXGMII will be set into phy->possible_interfaces in its .config_init method of PHY driver that supports it. ==================== Link: https://lore.kernel.org/r/20240615120028.2384732-1-quic_luoj@quicinc.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18dt-bindings: net: ethernet-controller: add 10g-qxgmii modeVladimir Oltean
Add the new interface mode 10g-qxgmii, which is similar to usxgmii but extend to 4 channels to support maximum of 4 ports with the link speed 10M/100M/1G/2.5G. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net: phy: introduce core support for phy-mode = "10g-qxgmii"Vladimir Oltean
10G-QXGMII is a MAC-to-PHY interface defined by the USXGMII multiport specification. It uses the same signaling as USXGMII, but it multiplexes 4 ports over the link, resulting in a maximum speed of 2.5G per port. Some in-tree SoCs like the NXP LS1028A use "usxgmii" when they mean either the single-port USXGMII or the quad-port 10G-QXGMII variant, and they could get away just fine with that thus far. But there is a need to distinguish between the 2 as far as SerDes drivers are concerned. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Luo Jie <quic_luoj@quicinc.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net: stmmac: Enable TSO on VLANsFurong Xu
The TSO engine works well when the frames are not VLAN Tagged. But it will produce broken segments when frames are VLAN Tagged. The first segment is all good, while the second segment to the last segment are broken, they lack of required VLAN tag. An example here: ======== // 1st segment of a VLAN Tagged TSO frame, nothing wrong. MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1518: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [.], seq 1:1449 // 2nd to last segments of a VLAN Tagged TSO frame, VLAN tag is missing. MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 1449:2897 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 2897:4345 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [.], seq 4345:5793 MacSrc > MacDst, ethertype IPv4 (0x0800), length 1514: HostA:42643 > HostB:5201: Flags [P.], seq 5793:7241 // normal VLAN Tagged non-TSO frame, nothing wrong. MacSrc > MacDst, ethertype 802.1Q (0x8100), length 1022: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [P.], seq 7241:8193 MacSrc > MacDst, ethertype 802.1Q (0x8100), length 70: vlan 100, p 1, ethertype IPv4 (0x0800), HostA:42643 > HostB:5201: Flags [F.], seq 8193 ======== When transmitting VLAN Tagged TSO frames, never insert VLAN tag by HW, always insert VLAN tag to SKB payload, then TSO works well on VLANs for all MAC cores. Tested on DWMAC CORE 5.10a, DWMAC CORE 5.20a and DWXGMAC CORE 3.20a Signed-off-by: Furong Xu <0x1207@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240615095611.517323-1-0x1207@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net/sched: act_api: fix possible infinite loop in tcf_idr_check_alloc()David Ruth
syzbot found hanging tasks waiting on rtnl_lock [1] A reproducer is available in the syzbot bug. When a request to add multiple actions with the same index is sent, the second request will block forever on the first request. This holds rtnl_lock, and causes tasks to hang. Return -EAGAIN to prevent infinite looping, while keeping documented behavior. [1] INFO: task kworker/1:0:5088 blocked for more than 143 seconds. Not tainted 6.9.0-rc4-syzkaller-00173-g3cdb45594619 #0 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/1:0 state:D stack:23744 pid:5088 tgid:5088 ppid:2 flags:0x00004000 Workqueue: events_power_efficient reg_check_chans_work Call Trace: <TASK> context_switch kernel/sched/core.c:5409 [inline] __schedule+0xf15/0x5d00 kernel/sched/core.c:6746 __schedule_loop kernel/sched/core.c:6823 [inline] schedule+0xe7/0x350 kernel/sched/core.c:6838 schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6895 __mutex_lock_common kernel/locking/mutex.c:684 [inline] __mutex_lock+0x5b8/0x9c0 kernel/locking/mutex.c:752 wiphy_lock include/net/cfg80211.h:5953 [inline] reg_leave_invalid_chans net/wireless/reg.c:2466 [inline] reg_check_chans_work+0x10a/0x10e0 net/wireless/reg.c:2481 Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: syzbot+b87c222546179f4513a7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b87c222546179f4513a7 Signed-off-by: David Ruth <druth@chromium.org> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20240614190326.1349786-1-druth@chromium.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18Merge branch 'net-lan743x-fixes-for-multiple-wol-related-issues'Paolo Abeni
Raju Lakkaraju says: ==================== net: lan743x: Fixes for multiple WOL related issues This patch series implement the following fixes: 1. Disable WOL upon resume in order to restore full data path operation 2. Support WOL at both the PHY and MAC appropriately 3. Remove interrupt mask clearing from config_init Patch-3 was sent seperately earlier. Review comments in link: https://lore.kernel.org/lkml/4a565d54-f468-4e32-8a2c-102c1203f72c@lunn.ch/T/ ==================== Link: https://lore.kernel.org/r/20240614171157.190871-1-Raju.Lakkaraju@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net: phy: mxl-gpy: Remove interrupt mask clearing from config_initRaju Lakkaraju
When the system resumes from sleep, the phy_init_hw() function invokes config_init(), which clears all interrupt masks and causes wake events to be lost in subsequent wake sequences. Remove interrupt mask clearing from config_init() and preserve relevant masks in config_intr(). Fixes: 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net: lan743x: Support WOL at both the PHY and MAC appropriatelyRaju Lakkaraju
Prevent options not supported by the PHY from being requested to it by the MAC Whenever a WOL option is supported by both, the PHY is given priority since that usually leads to better power savings. Fixes: e9e13b6adc33 ("lan743x: fix for potential NULL pointer dereference with bare card") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18net: lan743x: disable WOL upon resume to restore full data path operationRaju Lakkaraju
When Wake-on-LAN (WoL) is active and the system is in suspend mode, triggering a system event can wake the system from sleep, which may block the data path. To restore normal data path functionality after waking, disable all wake-up events. Furthermore, clear all Write 1 to Clear (W1C) status bits by writing 1's to them. Fixes: 4d94282afd95 ("lan743x: Add power management support") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18ALSA: hda/realtek: Add more codec ID to no shutup pins listKailang Yang
If it enter to runtime D3 state, it didn't shutup Headset MIC pin. Signed-off-by: Kailang Yang <kailang@realtek.com> Link: https://lore.kernel.org/r/8d86f61e7d6f4a03b311e4eb4e5caaef@realtek.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-06-18sound/oss/dmasound: add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=m68k, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in sound/oss/dmasound/dmasound_core.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/20240617-md-m68k-sound-oss-dmasound-v1-1-5c19306be930@quicinc.com
2024-06-18qca_spi: Make interrupt remembering atomicStefan Wahren
The whole mechanism to remember occurred SPI interrupts is not atomic, which could lead to unexpected behavior. So fix this by using atomic bit operations instead. Fixes: 291ab06ecf67 ("net: qualcomm: new Ethernet over SPI driver for QCA7000") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://lore.kernel.org/r/20240614145030.7781-1-wahrenst@gmx.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-18netns: Make get_net_ns() handle zero refcount netYue Haibing
Syzkaller hit a warning: refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 7890 at lib/refcount.c:25 refcount_warn_saturate+0xdf/0x1d0 Modules linked in: CPU: 3 PID: 7890 Comm: tun Not tainted 6.10.0-rc3-00100-gcaa4f9578aba-dirty #310 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xdf/0x1d0 Code: 41 49 04 31 ff 89 de e8 9f 1e cd fe 84 db 75 9c e8 76 26 cd fe c6 05 b6 41 49 04 01 90 48 c7 c7 b8 8e 25 86 e8 d2 05 b5 fe 90 <0f> 0b 90 90 e9 79 ff ff ff e8 53 26 cd fe 0f b6 1 RSP: 0018:ffff8881067b7da0 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff811c72ac RDX: ffff8881026a2140 RSI: ffffffff811c72b5 RDI: 0000000000000001 RBP: ffff8881067b7db0 R08: 0000000000000000 R09: 205b5d3730353139 R10: 0000000000000000 R11: 205d303938375420 R12: ffff8881086500c4 R13: ffff8881086500c4 R14: ffff8881086500b0 R15: ffff888108650040 FS: 00007f5b2961a4c0(0000) GS:ffff88823bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7ed36fd18 CR3: 00000001482f6000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0xa3/0xc0 ? __warn+0xa5/0x1c0 ? refcount_warn_saturate+0xdf/0x1d0 ? report_bug+0x1fc/0x2d0 ? refcount_warn_saturate+0xdf/0x1d0 ? handle_bug+0xa1/0x110 ? exc_invalid_op+0x3c/0xb0 ? asm_exc_invalid_op+0x1f/0x30 ? __warn_printk+0xcc/0x140 ? __warn_printk+0xd5/0x140 ? refcount_warn_saturate+0xdf/0x1d0 get_net_ns+0xa4/0xc0 ? __pfx_get_net_ns+0x10/0x10 open_related_ns+0x5a/0x130 __tun_chr_ioctl+0x1616/0x2370 ? __sanitizer_cov_trace_switch+0x58/0xa0 ? __sanitizer_cov_trace_const_cmp2+0x1c/0x30 ? __pfx_tun_chr_ioctl+0x10/0x10 tun_chr_ioctl+0x2f/0x40 __x64_sys_ioctl+0x11b/0x160 x64_sys_call+0x1211/0x20d0 do_syscall_64+0x9e/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f5b28f165d7 Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 8 RSP: 002b:00007ffc2b59c5e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5b28f165d7 RDX: 0000000000000000 RSI: 00000000000054e3 RDI: 0000000000000003 RBP: 00007ffc2b59c650 R08: 00007f5b291ed8c0 R09: 00007f5b2961a4c0 R10: 0000000029690010 R11: 0000000000000246 R12: 0000000000400730 R13: 00007ffc2b59cf40 R14: 0000000000000000 R15: 0000000000000000 </TASK> Kernel panic - not syncing: kernel: panic_on_warn set ... This is trigger as below: ns0 ns1 tun_set_iff() //dev is tun0 tun->dev = dev //ip link set tun0 netns ns1 put_net() //ref is 0 __tun_chr_ioctl() //TUNGETDEVNETNS net = dev_net(tun->dev); open_related_ns(&net->ns, get_net_ns); //ns1 get_net_ns() get_net() //addition on 0 Use maybe_get_net() in get_net_ns in case net's ref is zero to fix this Fixes: 0c3e0e3bb623 ("tun: Add ioctl() TUNGETDEVNETNS cmd to allow obtaining real net ns of tun device") Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240614131302.2698509-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-17Merge tag 'lsm-pr-20240617' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm fix from Paul Moore: "A single LSM/IMA patch to fix a problem caused by sleeping while in a RCU critical section" * tag 'lsm-pr-20240617' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: ima: Avoid blocking in RCU read-side critical section
2024-06-17net: Move dev_set_hwtstamp_phylib to net/core/dev.hKory Maincent
This declaration was added to the header to be called from ethtool. ethtool is separated from core for code organization but it is not really a separate entity, it controls very core things. As ethtool is an internal stuff it is not wise to have it in netdevice.h. Move the declaration to net/core/dev.h instead. Remove the EXPORT_SYMBOL_GPL call as ethtool can not be built as a module. Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://lore.kernel.org/r/20240612-feature_ptp_netnext-v15-2-b2a086257b63@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17net: dwc-xlgmac: fix missing MODULE_DESCRIPTION() warningJeff Johnson
With ARCH=hexagon, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/ethernet/synopsys/dwc-xlgmac.o With most other ARCH settings the MODULE_DESCRIPTION() is provided by the macro invocation in dwc-xlgmac-pci.c. However, for hexagon, the PCI bus is not enabled, and hence CONFIG_DWC_XLGMAC_PCI is not set. As a result, dwc-xlgmac-pci.c is not compiled, and hence is not linked into dwc-xlgmac.o. To avoid this issue, relocate the MODULE_DESCRIPTION() and other related macros from dwc-xlgmac-pci.c to dwc-xlgmac-common.c, since that file already has an existing MODULE_LICENSE() and it is unconditionally linked into dwc-xlgmac.o. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://lore.kernel.org/r/20240616-md-hexagon-drivers-net-ethernet-synopsys-v1-1-55852b60aef8@quicinc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17xfrm6: check ip6_dst_idev() return value in xfrm6_get_saddr()Eric Dumazet
ip6_dst_idev() can return NULL, xfrm6_get_saddr() must act accordingly. syzbot reported: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 1 PID: 12 Comm: kworker/u8:1 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: wg-kex-wg1 wg_packet_handshake_send_worker RIP: 0010:xfrm6_get_saddr+0x93/0x130 net/ipv6/xfrm6_policy.c:64 Code: df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 97 00 00 00 4c 8b ab d8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 86 00 00 00 4d 8b 6d 00 e8 ca 13 47 01 48 b8 00 RSP: 0018:ffffc90000117378 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffff88807b079dc0 RCX: ffffffff89a0d6d7 RDX: 0000000000000000 RSI: ffffffff89a0d6e9 RDI: ffff88807b079e98 RBP: ffff88807ad73248 R08: 0000000000000007 R09: fffffffffffff000 R10: ffff88807b079dc0 R11: 0000000000000007 R12: ffffc90000117480 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4586d00440 CR3: 0000000079042000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> xfrm_get_saddr net/xfrm/xfrm_policy.c:2452 [inline] xfrm_tmpl_resolve_one net/xfrm/xfrm_policy.c:2481 [inline] xfrm_tmpl_resolve+0xa26/0xf10 net/xfrm/xfrm_policy.c:2541 xfrm_resolve_and_create_bundle+0x140/0x2570 net/xfrm/xfrm_policy.c:2835 xfrm_bundle_lookup net/xfrm/xfrm_policy.c:3070 [inline] xfrm_lookup_with_ifid+0x4d1/0x1e60 net/xfrm/xfrm_policy.c:3201 xfrm_lookup net/xfrm/xfrm_policy.c:3298 [inline] xfrm_lookup_route+0x3b/0x200 net/xfrm/xfrm_policy.c:3309 ip6_dst_lookup_flow+0x15c/0x1d0 net/ipv6/ip6_output.c:1256 send6+0x611/0xd20 drivers/net/wireguard/socket.c:139 wg_socket_send_skb_to_peer+0xf9/0x220 drivers/net/wireguard/socket.c:178 wg_socket_send_buffer_to_peer+0x12b/0x190 drivers/net/wireguard/socket.c:200 wg_packet_send_handshake_initiation+0x227/0x360 drivers/net/wireguard/send.c:40 wg_packet_handshake_send_worker+0x1c/0x30 drivers/net/wireguard/send.c:51 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240615154231.234442-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17ipv6: prevent possible NULL dereference in rt6_probe()Eric Dumazet
syzbot caught a NULL dereference in rt6_probe() [1] Bail out if __in6_dev_get() returns NULL. [1] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cb: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000658-0x000000000000065f] CPU: 1 PID: 22444 Comm: syz-executor.0 Not tainted 6.10.0-rc2-syzkaller-00383-gb8481381d4e2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:rt6_probe net/ipv6/route.c:656 [inline] RIP: 0010:find_match+0x8c4/0xf50 net/ipv6/route.c:758 Code: 14 fd f7 48 8b 85 38 ff ff ff 48 c7 45 b0 00 00 00 00 48 8d b8 5c 06 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 19 RSP: 0018:ffffc900034af070 EFLAGS: 00010203 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc90004521000 RDX: 00000000000000cb RSI: ffffffff8990d0cd RDI: 000000000000065c RBP: ffffc900034af150 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000002 R12: 000000000000000a R13: 1ffff92000695e18 R14: ffff8880244a1d20 R15: 0000000000000000 FS: 00007f4844a5a6c0(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b31b27000 CR3: 000000002d42c000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> rt6_nh_find_match+0xfa/0x1a0 net/ipv6/route.c:784 nexthop_for_each_fib6_nh+0x26d/0x4a0 net/ipv4/nexthop.c:1496 __find_rr_leaf+0x6e7/0xe00 net/ipv6/route.c:825 find_rr_leaf net/ipv6/route.c:853 [inline] rt6_select net/ipv6/route.c:897 [inline] fib6_table_lookup+0x57e/0xa30 net/ipv6/route.c:2195 ip6_pol_route+0x1cd/0x1150 net/ipv6/route.c:2231 pol_lookup_func include/net/ip6_fib.h:616 [inline] fib6_rule_lookup+0x386/0x720 net/ipv6/fib6_rules.c:121 ip6_route_output_flags_noref net/ipv6/route.c:2639 [inline] ip6_route_output_flags+0x1d0/0x640 net/ipv6/route.c:2651 ip6_dst_lookup_tail.constprop.0+0x961/0x1760 net/ipv6/ip6_output.c:1147 ip6_dst_lookup_flow+0x99/0x1d0 net/ipv6/ip6_output.c:1250 rawv6_sendmsg+0xdab/0x4340 net/ipv6/raw.c:898 inet_sendmsg+0x119/0x140 net/ipv4/af_inet.c:853 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_write_iter+0x4b8/0x5c0 net/socket.c:1160 new_sync_write fs/read_write.c:497 [inline] vfs_write+0x6b6/0x1140 fs/read_write.c:590 ksys_write+0x1f8/0x260 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: 52e1635631b3 ("[IPV6]: ROUTE: Add router_probe_interval sysctl.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240615151454.166404-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17ipv6: prevent possible NULL deref in fib6_nh_init()Eric Dumazet
syzbot reminds us that in6_dev_get() can return NULL. fib6_nh_init() ip6_validate_gw( &idev ) ip6_route_check_nh( idev ) *idev = in6_dev_get(dev); // can be NULL Oops: general protection fault, probably for non-canonical address 0xdffffc00000000bc: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x00000000000005e0-0x00000000000005e7] CPU: 0 PID: 11237 Comm: syz-executor.3 Not tainted 6.10.0-rc2-syzkaller-00249-gbe27b8965297 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:fib6_nh_init+0x640/0x2160 net/ipv6/route.c:3606 Code: 00 00 fc ff df 4c 8b 64 24 58 48 8b 44 24 28 4c 8b 74 24 30 48 89 c1 48 89 44 24 28 48 8d 98 e0 05 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 38 84 c0 0f 85 b3 17 00 00 8b 1b 31 ff 89 de e8 b8 8b RSP: 0018:ffffc900032775a0 EFLAGS: 00010202 RAX: 00000000000000bc RBX: 00000000000005e0 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffffc90003277a54 RDI: ffff88802b3a08d8 RBP: ffffc900032778b0 R08: 00000000000002fc R09: 0000000000000000 R10: 00000000000002fc R11: 0000000000000000 R12: ffff88802b3a08b8 R13: 1ffff9200064eec8 R14: ffffc90003277a00 R15: dffffc0000000000 FS: 00007f940feb06c0(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000245e8000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ip6_route_info_create+0x99e/0x12b0 net/ipv6/route.c:3809 ip6_route_add+0x28/0x160 net/ipv6/route.c:3853 ipv6_route_ioctl+0x588/0x870 net/ipv6/route.c:4483 inet6_ioctl+0x21a/0x280 net/ipv6/af_inet6.c:579 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f940f07cea9 Fixes: 428604fb118f ("ipv6: do not set routes if disable_ipv6 has been enabled") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240614082002.26407-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17net: mana: Use mana_cleanup_port_context() for rxq cleanupShradha Gupta
To cleanup rxqs in port context structures, instead of duplicating the code, use existing function mana_cleanup_port_context() which does the exact cleanup that's needed. Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Heng Qi <hengqi@linux.alibaba.com> Link: https://lore.kernel.org/r/1718349548-28697-1-git-send-email-shradhagupta@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17selftests: mptcp: userspace_pm: fixed subtest namesMatthieu Baerts (NGI0)
It is important to have fixed (sub)test names in TAP, because these names are used to identify them. If they are not fixed, tracking cannot be done. Some subtests from the userspace_pm selftest were using random numbers in their names: the client and server address IDs from $RANDOM, and the client port number randomly picked by the kernel when creating the connection. These values have been replaced by 'client' and 'server' words: that's even more helpful than showing random numbers. Note that the addresses IDs are incremented and decremented in the test: +1 or -1 are then displayed in these cases. Not to loose info that can be useful for debugging in case of issues, these random numbers are now displayed at the beginning of the test. Fixes: f589234e1af0 ("selftests: mptcp: userspace_pm: format subtests results in TAP") Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240614-upstream-net-20240614-selftests-mptcp-uspace-pm-fixed-test-names-v1-1-460ad3edb429@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17tcp: clear tp->retrans_stamp in tcp_rcv_fastopen_synack()Eric Dumazet
Some applications were reporting ETIMEDOUT errors on apparently good looking flows, according to packet dumps. We were able to root cause the issue to an accidental setting of tp->retrans_stamp in the following scenario: - client sends TFO SYN with data. - server has TFO disabled, ACKs only SYN but not payload. - client receives SYNACK covering only SYN. - tcp_ack() eats SYN and sets tp->retrans_stamp to 0. - tcp_rcv_fastopen_synack() calls tcp_xmit_retransmit_queue() to retransmit TFO payload w/o SYN, sets tp->retrans_stamp to "now", but we are not in any loss recovery state. - TFO payload is ACKed. - we are not in any loss recovery state, and don't see any dupacks, so we don't get to any code path that clears tp->retrans_stamp. - tp->retrans_stamp stays non-zero for the lifetime of the connection. - after first RTO, tcp_clamp_rto_to_user_timeout() clamps second RTO to 1 jiffy due to bogus tp->retrans_stamp. - on clamped RTO with non-zero icsk_retransmits, retransmits_timed_out() sets start_ts from tp->retrans_stamp from TFO payload retransmit hours/days ago, and computes bogus long elapsed time for loss recovery, and suffers ETIMEDOUT early. Fixes: a7abf3cd76e1 ("tcp: consider using standard rtx logic in tcp_rcv_fastopen_synack()") CC: stable@vger.kernel.org Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Co-developed-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240614130615.396837-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17fou: remove warn in gue_gro_receive on unsupported protocolWillem de Bruijn
Drop the WARN_ON_ONCE inn gue_gro_receive if the encapsulated type is not known or does not have a GRO handler. Such a packet is easily constructed. Syzbot generates them and sets off this warning. Remove the warning as it is expected and not actionable. The warning was previously reduced from WARN_ON to WARN_ON_ONCE in commit 270136613bf7 ("fou: Do WARN_ON_ONCE in gue_gro_receive for bad proto callbacks"). Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240614122552.1649044-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-17Revert "mm: mmap: allow for the maximum number of bits for randomizing ↵Linus Torvalds
mmap_base by default" This reverts commit 3afb76a66b5559a7b595155803ce23801558a7a9. This was a wrongheaded workaround for an issue that had already been fixed much better by commit 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit"). Asking users questions at kernel compile time that they can't make sense of is not a viable strategy. And the fact that even the kernel VM maintainers apparently didn't catch that this "fix" is not a fix any more pretty much proves the point that people can't be expected to understand the implications of the question. It may well be the case that we could improve things further, and that __thp_get_unmapped_area() should take the mapping randomization into account even for 64-bit kernels. Maybe we should not be so eager to use THP mappings. But in no case should this be a kernel config option. Cc: Rafael Aquini <aquini@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-06-17Merge tag 'mm-hotfixes-stable-2024-06-17-11-43' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Mainly MM singleton fixes. And a couple of ocfs2 regression fixes" * tag 'mm-hotfixes-stable-2024-06-17-11-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: kcov: don't lose track of remote references during softirqs mm: shmem: fix getting incorrect lruvec when replacing a shmem folio mm/debug_vm_pgtable: drop RANDOM_ORVALUE trick mm: fix possible OOB in numa_rebuild_large_mapping() mm/migrate: fix kernel BUG at mm/compaction.c:2761! selftests: mm: make map_fixed_noreplace test names stable mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC mm: mmap: allow for the maximum number of bits for randomizing mmap_base by default gcov: add support for GCC 14 zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING mm: huge_memory: fix misused mapping_large_folio_support() for anon folios lib/alloc_tag: fix RCU imbalance in pgalloc_tag_get() lib/alloc_tag: do not register sysctl interface when CONFIG_SYSCTL=n MAINTAINERS: remove Lorenzo as vmalloc reviewer Revert "mm: init_mlocked_on_free_v3" mm/page_table_check: fix crash on ZONE_DEVICE gcc: disable '-Warray-bounds' for gcc-9 ocfs2: fix NULL pointer dereference in ocfs2_abort_trigger() ocfs2: fix NULL pointer dereference in ocfs2_journal_dirty()
2024-06-17Merge tag 'hardening-v6.10-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fixes from Kees Cook: - yama: document function parameter (Christian Göttsche) - mm/util: Swap kmemdup_array() arguments (Jean-Philippe Brucker) - kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX() - MAINTAINERS: Update entries for Kees Cook * tag 'hardening-v6.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: MAINTAINERS: Update entries for Kees Cook kunit/overflow: Adjust for __counted_by with DEFINE_RAW_FLEX() yama: document function parameter mm/util: Swap kmemdup_array() arguments
2024-06-17MAINTAINERS: Update entries for Kees CookKees Cook
Update current email address for Kees Cook in the MAINTAINER file to match the change from commit 4e173c825b19 ("mailmap: update entry for Kees Cook"). Link: https://lore.kernel.org/r/20240617181257.work.206-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2024-06-17Merge tag 'hyperv-fixes-signed-20240616' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V fixes from Wei Liu: - Some cosmetic changes for hv.c and balloon.c (Aditya Nagesh) - Two documentation updates (Michael Kelley) - Suppress the invalid warning for packed member alignment (Saurabh Sengar) - Two hv_balloon fixes (Michael Kelley) * tag 'hyperv-fixes-signed-20240616' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: Cosmetic changes for hv.c and balloon.c Documentation: hyperv: Improve synic and interrupt handling description Documentation: hyperv: Update spelling and fix typo tools: hv: suppress the invalid warning for packed member alignment hv_balloon: Enable hot-add for memblock sizes > 128 MiB hv_balloon: Use kernel macros to simplify open coded sequences
2024-06-17Merge branch 'net-smc-IPPROTO_SMC'David S. Miller
D. Wythe says: ==================== Introduce IPPROTO_SMC This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Such as: 1. Replace IPPROTO_TCP with IPPROTO_SMC in the socket() syscall initiated by the user, without the use of LD-PRELOAD. 2. Select whether immediate fallback is required based on peer's port/ip before connect(). A very significant result is that we can now use eBPF to implement smc_run instead of LD_PRELOAD, who is completely ineffective in scenarios of static linking. Another potential value is that we are attempting to optimize the performance of fallback socks, where merging socks is an important part, and it relies on the creation of SMC sockets under the AF_INET path. (More information : https://lore.kernel.org/netdev/1699442703-25015-1-git-send-email-alibuda@linux.alibaba.com/T/) v2 -> v1: - Code formatting, mainly including alignment and annotation repair. - move inet_smc proto ops to inet_smc.c, avoiding af_smc.c becoming too bulky. - Fix the issue where refactoring affects the initialization order. - Fix compile warning (unused out_inet_prot) while CONFIG_IPV6 was not set. v3 -> v2: - Add Alibaba's copyright information to the newfile v4 -> v3: - Fix some spelling errors - Align function naming style with smc_sock_init() to smc_sk_init() - Reversing the order of the conditional checks on clcsock to make the code more intuitive v5 -> v4: - Fix some spelling errors - Added comment, "/* CONFIG_IPV6 */", after the final #endif directive. - Rename smc_inet.h and smc_inet.c to smc_inet.h and smc_inet.c - Encapsulate the initialization and destruction of inet_smc in inet_smc.c, rather than implementing it directly in af_smc.c. - Remove useless header files in smc_inet.h - Make smc_inet_prot_xxx and smc_inet_sock_init() to be static, since it's only used in smc_inet.c v6 -> v5: - Wrapping lines to not exceed 80 characters - Combine initialization and error handling of smc_inet6 into the same #if macro block. v7 -> v6: - Modify the value of IPPROTO_SMC to 256 so that it does not affect IPPROTO-MAX v8 -> v7: - Remove useless declarations. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/smc: Introduce IPPROTO_SMCD. Wythe
This patch allows to create smc socket via AF_INET, similar to the following code, /* create v4 smc sock */ v4 = socket(AF_INET, SOCK_STREAM, IPPROTO_SMC); /* create v6 smc sock */ v6 = socket(AF_INET6, SOCK_STREAM, IPPROTO_SMC); There are several reasons why we believe it is appropriate here: 1. For smc sockets, it actually use IPv4 (AF-INET) or IPv6 (AF-INET6) address. There is no AF_SMC address at all. 2. Create smc socket in the AF_INET(6) path, which allows us to reuse the infrastructure of AF_INET(6) path, such as common ebpf hooks. Otherwise, smc have to implement it again in AF_SMC path. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/smc: expose smc proto operationsD. Wythe
Externalize smc proto operations (smc_xxx) to allow access from files other than af_smc.c This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/smc: refactoring initialization of smc sockD. Wythe
This patch aims to isolate the shared components of SMC socket allocation by introducing smc_sk_init() for sock initialization and __smc_create_clcsk() for the initialization of clcsock. This is in preparation for the subsequent implementation of the AF_INET version of SMC. Signed-off-by: D. Wythe <alibuda@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com> Reviewed-by: Dust Li <dust.li@linux.alibaba.com> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com> Tested-by: Wenjia Zhang <wenjia@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net: make for_each_netdev_dump() a little more bug-proofJakub Kicinski
I find the behavior of xa_for_each_start() slightly counter-intuitive. It doesn't end the iteration by making the index point after the last element. IOW calling xa_for_each_start() again after it "finished" will run the body of the loop for the last valid element, instead of doing nothing. This works fine for netlink dumps if they terminate correctly (i.e. coalesce or carefully handle NLM_DONE), but as we keep getting reminded legacy dumps are unlikely to go away. Fixing this generically at the xa_for_each_start() level seems hard - there is no index reserved for "end of iteration". ifindexes are 31b wide, tho, and iterator is ulong so for for_each_netdev_dump() it's safe to go to the next element. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17netrom: Fix a memory leak in nr_heartbeat_expiry()Gavrilov Ilia
syzbot reported a memory leak in nr_create() [0]. Commit 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") added sock_hold() to the nr_heartbeat_expiry() function, where a) a socket has a SOCK_DESTROY flag or b) a listening socket has a SOCK_DEAD flag. But in the case "a," when the SOCK_DESTROY flag is set, the file descriptor has already been closed and the nr_release() function has been called. So it makes no sense to hold the reference count because no one will call another nr_destroy_socket() and put it as in the case "b." nr_connect nr_establish_data_link nr_start_heartbeat nr_release switch (nr->state) case NR_STATE_3 nr->state = NR_STATE_2 sock_set_flag(sk, SOCK_DESTROY); nr_rx_frame nr_process_rx_frame switch (nr->state) case NR_STATE_2 nr_state2_machine() nr_disconnect() nr_sk(sk)->state = NR_STATE_0 sock_set_flag(sk, SOCK_DEAD) nr_heartbeat_expiry switch (nr->state) case NR_STATE_0 if (sock_flag(sk, SOCK_DESTROY) || (sk->sk_state == TCP_LISTEN && sock_flag(sk, SOCK_DEAD))) sock_hold() // ( !!! ) nr_destroy_socket() To fix the memory leak, let's call sock_hold() only for a listening socket. Found by InfoTeCS on behalf of Linux Verification Center (linuxtesting.org) with Syzkaller. [0]: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Reported-by: syzbot+d327a1f3b12e1e206c16@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d327a1f3b12e1e206c16 Fixes: 409db27e3a2e ("netrom: Fix use-after-free of a listening socket.") Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17Merge branch 'mlx5-genl-queue-stats'David S. Miller
Joe Damato says: ==================== mlx5: Add netdev-genl queue stats Welcome to v5. Switched from RFC to just a v5, because I think this is pretty close. Minor changes from v4 summarized below in the changelog. Note that my NIC does not seem to support PTP and I couldn't get the mlnx-tools mlnx_qos script to work, so I was only able to test the following cases: - device up at boot - adjusting queue counts - device down (e.g. ip link set dev eth4 down) Please see the commit message of patch 2/2 for more details on output and test cases. rfcv4 thread: https://lore.kernel.org/linux-kernel/20240604004629.299699-1-jdamato@fastly.com/T/ rfcv4 -> v5: - Patch 1/2: change variable name 'mlx5e_qid' to 'txq_ix'. - Patch 2/2: - remove logic in mlx5e_get_queue_stats_rx for PTP. PTP RX are always reported in base. - report PTP TX in mlx5e_get_base_stats only if: - PTP has ever been opened, and - either PTP is NULL (closed) or the MLX5E_PTP_STATE_TX bit in its state is not set Otherwise, PTP TX will be reported when the txq_ix is passed into mlx5e_get_queue_stats_tx rfcv3 -> rfcv4: - Patch 1/2 now creates a mapping (priv->txq2sq_stats) which maps txq indices to sq_stats structures so stats can be accessed directly. This mapping is kept up to date along side txq2sq. - Patch 2/2: - All mutex_lock/unlock on state_lock has been dropped. - mlx5e_get_queue_stats_rx now uses ASSERT_RTNL() and has a special case for PTP. If PTP was ever opened, is currently opened, and the channel index matches, stats for PTP RX are output. - mlx5e_get_queue_stats_tx rewritten to use priv->txq2sq_stats. No corner cases are needed here because any txq idx (passed in as i) will have an up to date mapping in priv->txq2sq_stats. - mlx5e_get_base_stats: - in the RX case: - iterates from [params.num_channels, stats_nch) collecting stats. - if ptp was ever opened but is currently closed, add the PTP stats. - in the TX case: - handle 2 cases: - the channel is available, so sum only the unavailable TCs [mlx5e_get_dcb_num_tc, max_opened_tc). - the channel is unavailable, so sum all TCs [0, max_opened_tc). - if ptp was ever opened but is currently closed, add the PTP sq stats. v2 -> rfcv3: - Added patch 1/2 which creates some helpers for computing the txq_ix and ch_ix/tc_ix. - Patch 2/2 modified in several ways: - Fixed variable declarations in mlx5e_get_queue_stats_rx to be at the start of the function. - mlx5e_get_queue_stats_tx rewritten to access sq stats directly by using the helpers added in the previous patch. - mlx5e_get_base_stats modified in several ways: - Took the state_lock when accessing priv->channels. - For the base RX stats, code was simplified to call mlx5e_get_queue_stats_rx instead of repeating the same code. - For the base TX stats, I attempted to implement what I think Tariq suggested in the previous thread: - for available channels, only unavailable TC stats are summed - for unavailable channels, all stats for TCs up to max_opened_tc are summed. v1 - > v2: - Essentially a full rewrite after comments from Jakub, Tariq, and Zhu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/mlx5e: Add per queue netdev-genl statsJoe Damato
./cli.py --spec netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue"}' ...snip {'ifindex': 7, 'queue-id': 62, 'queue-type': 'rx', 'rx-alloc-fail': 0, 'rx-bytes': 105965251, 'rx-packets': 179790}, {'ifindex': 7, 'queue-id': 0, 'queue-type': 'tx', 'tx-bytes': 9402665, 'tx-packets': 17551}, ...snip Also tested with the script tools/testing/selftests/drivers/net/stats.py in several scenarios to ensure stats tallying was correct: - on boot (default queue counts) - adjusting queue count up or down (ethtool -L eth0 combined ...) The tools/testing/selftests/drivers/net/stats.py brings the device up, so to test with the device down, I did the following: $ ip link show eth4 7: eth4: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN [..snip..] [..snip..] $ cat /proc/net/dev | grep eth4 eth4: 235710489 434811 [..snip rx..] 2878744 21227 [..snip tx..] $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"ifindex": 7}' [{'ifindex': 7, 'rx-alloc-fail': 0, 'rx-bytes': 235710489, 'rx-packets': 434811, 'tx-bytes': 2878744, 'tx-packets': 21227}] Compare the values in /proc/net/dev match the output of cli for the same device, even while the device is down. Note that while the device is down, per queue stats output nothing (because the device is down there are no queues): $ ./cli.py --spec ../../../Documentation/netlink/specs/netdev.yaml \ --dump qstats-get --json '{"scope": "queue", "ifindex": 7}' [] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-17net/mlx5e: Add txq to sq stats mappingJoe Damato
mlx5 currently maps txqs to an sq via priv->txq2sq. It is useful to map txqs to sq_stats, as well, for direct access to stats. Add priv->txq2sq_stats and insert mappings. The mappings will be used next to tabulate stats information. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-16Linux 6.10-rc4v6.10-rc4Linus Torvalds
2024-06-16Merge tag 'parisc-for-6.10-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "On parisc we have suffered since years from random segfaults which seem to have been triggered due to cache inconsistencies. Those segfaults happened more often on machines with PA8800 and PA8900 CPUs, which have much bigger caches than the earlier machines. Dave Anglin has worked over the last few weeks to fix this bug. His patch has been successfully tested by various people on various machines and with various kernels (6.6, 6.8 and 6.9), and the debian buildd servers haven't shown a single random segfault with this patch. Since the cache handling has been reworked, the patch is slightly bigger than I would like in this stage, but the greatly improved stability IMHO justifies the inclusion now" * tag 'parisc-for-6.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Try to fix random segmentation faults in package builds