summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-03-22bootconfig: Change message if no bootconfig with CONFIG_BOOT_CONFIG_FORCE=yMasami Hiramatsu (Google)
Change no bootconfig data error message if user do not specify 'bootconfig' option but CONFIG_BOOT_CONFIG_FORCE=y. With CONFIG_BOOT_CONFIG_FORCE=y, the kernel proceeds bootconfig check even if user does not specify 'bootconfig' option. So the current error message is confusing. Let's show just an information message to notice skipping the bootconfig in that case. Link: https://lore.kernel.org/all/167754610254.318944.16848412476667893329.stgit@devnote2/ Fixes: b743852ccc1d ("Allow forcing unconditional bootconfig processing") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Link: https://lore.kernel.org/all/CAMuHMdV9jJvE2y8gY5V_CxidUikCf5515QMZHzTA3rRGEOj6=w@mail.gmail.com/ Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Mukesh Ojha <quic_mojha@quicinc.com>
2023-03-22wifi: mac80211: fix mesh path discovery based on unicast packetsFelix Fietkau
If a packet has reached its intended destination, it was bumped to the code that accepts it, without first checking if a mesh_path needs to be created based on the discovered source. Fix this by moving the destination address check further down. Cc: stable@vger.kernel.org Fixes: 986e43b19ae9 ("wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-3-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: mac80211: fix qos on mesh interfacesFelix Fietkau
When ieee80211_select_queue is called for mesh, the sta pointer is usually NULL, since the nexthop is looked up much later in the tx path. Explicitly check for unicast address in that case in order to make qos work again. Cc: stable@vger.kernel.org Fixes: 50e2ab392919 ("wifi: mac80211: fix queue selection for mesh/OCB interfaces") Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://lore.kernel.org/r/20230314095956.62085-1-nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: iwlwifi: mvm: protect TXQ list manipulationJohannes Berg
Some recent upstream debugging uncovered the fact that in iwlwifi, the TXQ list manipulation is racy. Introduce a new state bit for when the TXQ is completely ready and can be used without locking, and if that's not set yet acquire the lock to check everything correctly. Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Tested-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-22wifi: iwlwifi: mvm: fix mvmtxq->stopped handlingJohannes Berg
This could race if the queue is redirected while full, then the flushing internally would start it while it's not yet usable again. Fix it by using two state bits instead of just one. Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Tested-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-21net/sonic: use dma_mapping_error() for error checkZhang Changzhong
The DMA address returned by dma_map_single() should be checked with dma_mapping_error(). Fix it accordingly. Fixes: efcce839360f ("[PATCH] macsonic/jazzsonic network drivers update") Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@linux-m68k.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/6645a4b5c1e364312103f48b7b36783b94e197a2.1679370343.git.fthain@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21Merge branch 'fix-trainwreck-with-ocelot-switch-statistics-counters'Jakub Kicinski
Vladimir Oltean says: ==================== Fix trainwreck with Ocelot switch statistics counters While testing the patch set for preemptible traffic classes with some controlled traffic and measuring counter deltas: https://lore.kernel.org/netdev/20230220122343.1156614-1-vladimir.oltean@nxp.com/ I noticed that in the output of "ethtool -S swp0 --groups eth-mac eth-phy eth-ctrl rmon -- --src emac | grep -v ': 0'", the TX counters were off. Quickly I realized that their values were permutated by 1 compared to their names, and that for example tx-rmon-etherStatsPkts64to64Octets was incrementing when tx-rmon-etherStatsPkts65to127Octets should have. Initially I suspected something having to do with the bulk reading logic, and indeed I found a bug there (fixed as 1/3), but that was not the source of the problems. Instead it revealed other problems. While dumping the regions created by the driver on my switch, I figured out that it sees a discontinuity which shouldn't have existed between reg 0x278 and reg 0x280. Discontinuity between last reg 0x0 and new reg 0x0, creating new region Discontinuity between last reg 0x108 and new reg 0x200, creating new region Discontinuity between last reg 0x278 and new reg 0x280, creating new region Discontinuity between last reg 0x2b0 and new reg 0x400, creating new region region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 31 contiguous counters starting with SYS:STAT:CNT[0x080] region of 13 contiguous counters starting with SYS:STAT:CNT[0x0a0] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] That is where TX_MM_HOLD should have been, and that was the bug, since it was missing. After adding it, the regions look like this and the off-by-one issue is resolved: Discontinuity between last reg 0x000000 and new reg 0x000000, creating new region Discontinuity between last reg 0x000108 and new reg 0x000200, creating new region Discontinuity between last reg 0x0002b0 and new reg 0x000400, creating new region region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] However, as I am thinking out loud, it should have not reported the other counters as off by one even when skipping TX_MM_HOLD... after all, on Ocelot/Seville, there are more counters which need to be skipped. Which is when I investigated and noticed the bug solved in 2/3. I've validated that both on native VSC9959 (which uses ocelot_mm_stats_layout) as well as by faking the other switches by making VSC9959 use the plain ocelot_stats_layout. To summarize: on all Ocelot switches, the TX counters and drop counters are completely broken. The RX counters are mostly fine. With this occasion, I have collected more cleanup patches in this area, which I'm going to submit after the net -> net-next merge. ==================== Link: https://lore.kernel.org/r/20230321010325.897817-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: add TX_MM_HOLD to ocelot_mm_stats_layoutVladimir Oltean
The lack of a definition for this counter is what initially prompted me to investigate a problem which really manifested itself as the previous change, "net: mscc: ocelot: fix transfer from region->buf to ocelot->stats". When TX_MM_HOLD is defined in enum ocelot_stat but not in struct ocelot_stat_layout ocelot_mm_stats_layout, this creates a hole, which due to the aforementioned bug, makes all counters following TX_MM_HOLD be recorded off by one compared to their correct position. So for example, a non-zero TX_PMAC_OCTETS would be reported as TX_MERGE_FRAGMENTS, TX_PMAC_UNICAST would be reported as TX_PMAC_OCTETS, TX_PMAC_64 would be reported as TX_PMAC_PAUSE, etc etc. This is because the size of the hole (1) is much smaller than the size of the region, so the phenomenon where the stats are off-by-one, rather than lost, prevails. However, the phenomenon where stats are lost can be seen too, for example with DROP_LOCAL, which is at the beginning of its own region (offset 0x000400 vs the previous 0x0002b0 constitutes a discontinuity). This is also reported as off by one and saved to TX_PMAC_1527_MAX, but that counter is not reported to the unstructured "ethtool -S", as opposed to DROP_LOCAL which is (as "drop_local"). Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: fix transfer from region->buf to ocelot->statsVladimir Oltean
To understand the problem, we need some definitions. The driver is aware of multiple counters (enum ocelot_stat), yet not all switches supported by the driver implement all counters. There are 2 statistics layouts: ocelot_stats_layout and ocelot_mm_stats_layout, the latter having 36 counters more than the former. ocelot->stats[] is not a compact array, i.e. there are elements within it which are not going to be populated for ocelot_stats_layout. On the other hand, ocelot->stats[] is easily indexable, for example "tx_octets" for port 3 can be found at ocelot->stats[3 * OCELOT_NUM_STATS + OCELOT_STAT_TX_OCTETS], and that is why we keep it sparse. Regions, as created by ocelot_prepare_stats_regions(), are compact (every element from region->buf will correspond to a counter that is present in this switch's layout) but are not easily indexable. Let's define holes as the ranges of values of enum ocelot_stat for which ocelot_stats_layout doesn't have a "reg" defined. For example, there is a hole between OCELOT_STAT_RX_GREEN_PRIO_7 and OCELOT_STAT_TX_OCTETS which is of 23 elements that are only present on ocelot_mm_stats_layout, and as such, they are also present in enum ocelot_stat. Let's define the left extremity of the hole - the last enum ocelot_stat still defined - as A (in this case OCELOT_STAT_RX_GREEN_PRIO_7) and the right extremity - the first enum ocelot_stat that is defined after a series of undefined ones - as B (in this case OCELOT_STAT_TX_OCTETS). There is a bug in the procedure which transfers stats from region->buf[] to ocelot->stats[]. For each hole in the ocelot_stats_layout, the logic transfers the stats starting with enum ocelot_stat B to ocelot->stats[] index A + 1. So all stats after a hole are saved to a position which is off by B - A + 1 elements. This causes 2 kinds of issues: (a) counters which shouldn't increment increment (b) counters which should increment don't Holes in the ocelot_stat_layout automatically imply the end of a region and the beginning of a new one; however the reverse is not necessarily true. For example, for ocelot_mm_stat_layout, there could be multiple regions (which indicate discontinuities in register addresses) while there is no hole (which indicates discontinuities in enum ocelot_stat values). In the example above, the stats from the second region->buf[] are not transferred to ocelot->stats starting with index "port * OCELOT_NUM_STATS + OCELOT_STAT_TX_OCTETS" as they should, but rather, starting with element "port * OCELOT_NUM_STATS + OCELOT_STAT_RX_GREEN_PRIO_7 + 1". That stats[] array element is not reported to user space for switches that use ocelot_stat_layout, and that is how issue (b) occurs. However, if the length of the second region is larger than the hole, then some stats will start to be transferred to the ocelot->stats[] indices which *are* reported to user space, but those indices contain wrong values (corresponding to unexpected counters). This is how issue (a) occurs. The procedure, as it was introduced in commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats"), was not buggy, because there were no holes in the struct ocelot_stat_layout instances at that time. The problem is that when those holes were introduced, the function was not updated to take them into consideration. To update the procedure, we need to know, for each region, which enum ocelot_stat corresponds to its region->base. We have no way of deducing that based on the contents of struct ocelot_stats_region, so we need to add this information. Fixes: ab3f97a9610a ("net: mscc: ocelot: export ethtool MAC Merge stats for Felix VSC9959") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: mscc: ocelot: fix stats region batchingVladimir Oltean
The blamed commit changed struct ocelot_stat_layout :: "u32 offset" to "u32 reg". However, "u32 reg" is not quite a register address, but an enum ocelot_reg, which in itself encodes an enum ocelot_target target in the upper bits, and an index into the ocelot->map[target][] array in the lower bits. So, whereas the previous code comparison between stats_layout[i].offset and last + 1 was correct (because those "offsets" at the time were 32-bit relative addresses), the new code, comparing layout[i].reg to last + 4 is not correct, because the "reg" here is an enum/index, not an actual register address. What we want to compare are indeed register addresses, but to do that, we need to actually go through the same motions as __ocelot_bulk_read_ix() itself. With this bug, all statistics counters are deemed by ocelot_prepare_stats_regions() as constituting their own region. (Truncated) log on VSC9959 (Felix) below (prints added by me): Before: region of 1 contiguous counters starting with SYS:STAT:CNT[0x000] region of 1 contiguous counters starting with SYS:STAT:CNT[0x001] region of 1 contiguous counters starting with SYS:STAT:CNT[0x002] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x041] region of 1 contiguous counters starting with SYS:STAT:CNT[0x042] region of 1 contiguous counters starting with SYS:STAT:CNT[0x080] region of 1 contiguous counters starting with SYS:STAT:CNT[0x081] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x0ac] region of 1 contiguous counters starting with SYS:STAT:CNT[0x100] region of 1 contiguous counters starting with SYS:STAT:CNT[0x101] ... region of 1 contiguous counters starting with SYS:STAT:CNT[0x111] After: region of 67 contiguous counters starting with SYS:STAT:CNT[0x000] region of 45 contiguous counters starting with SYS:STAT:CNT[0x080] region of 18 contiguous counters starting with SYS:STAT:CNT[0x100] Since commit d87b1c08f38a ("net: mscc: ocelot: use bulk reads for stats") intended bulking as a performance improvement, and since now, with trivial-sized regions, performance is even worse than without bulking at all, this could easily qualify as a performance regression. Fixes: d4c367650704 ("net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21erspan: do not use skb_mac_header() in ndo_start_xmit()Eric Dumazet
Drivers should not assume skb_mac_header(skb) == skb->data in their ndo_start_xmit(). Use skb_network_offset() and skb_transport_offset() which better describe what is needed in erspan_fb_xmit() and ip6erspan_tunnel_xmit() syzbot reported: WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 skb_mac_header include/linux/skbuff.h:2873 [inline] WARNING: CPU: 0 PID: 5083 at include/linux/skbuff.h:2873 ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Modules linked in: CPU: 0 PID: 5083 Comm: syz-executor406 Not tainted 6.3.0-rc2-syzkaller-00866-gd4671cb96fa3 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/02/2023 RIP: 0010:skb_mac_header include/linux/skbuff.h:2873 [inline] RIP: 0010:ip6erspan_tunnel_xmit+0x1d9c/0x2d90 net/ipv6/ip6_gre.c:962 Code: 04 02 41 01 de 84 c0 74 08 3c 03 0f 8e 1c 0a 00 00 45 89 b4 24 c8 00 00 00 c6 85 77 fe ff ff 01 e9 33 e7 ff ff e8 b4 27 a1 f8 <0f> 0b e9 b6 e7 ff ff e8 a8 27 a1 f8 49 8d bf f0 0c 00 00 48 b8 00 RSP: 0018:ffffc90003b2f830 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000 RDX: ffff888021273a80 RSI: ffffffff88e1bd4c RDI: 0000000000000003 RBP: ffffc90003b2f9d8 R08: 0000000000000003 R09: 000000000000ffff R10: 000000000000ffff R11: 0000000000000000 R12: ffff88802b28da00 R13: 00000000000000d0 R14: ffff88807e25b6d0 R15: ffff888023408000 FS: 0000555556a61300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055e5b11eb6e8 CR3: 0000000027c1b000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __netdev_start_xmit include/linux/netdevice.h:4900 [inline] netdev_start_xmit include/linux/netdevice.h:4914 [inline] __dev_direct_xmit+0x504/0x730 net/core/dev.c:4300 dev_direct_xmit include/linux/netdevice.h:3088 [inline] packet_xmit+0x20a/0x390 net/packet/af_packet.c:285 packet_snd net/packet/af_packet.c:3075 [inline] packet_sendmsg+0x31a0/0x5150 net/packet/af_packet.c:3107 sock_sendmsg_nosec net/socket.c:724 [inline] sock_sendmsg+0xde/0x190 net/socket.c:747 __sys_sendto+0x23a/0x340 net/socket.c:2142 __do_sys_sendto net/socket.c:2154 [inline] __se_sys_sendto net/socket.c:2150 [inline] __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2150 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f123aaa1039 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc15d12058 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f123aaa1039 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000020000040 R09: 0000000000000014 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f123aa648c0 R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000 Fixes: 1baf5ebf8954 ("erspan: auto detect truncated packets.") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230320163427.8096-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21atm: idt77252: fix kmemleak when rmmod idt77252Li Zetao
There are memory leaks reported by kmemleak: unreferenced object 0xffff888106500800 (size 128): comm "modprobe", pid 1017, jiffies 4297787785 (age 67.152s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000970ce626>] __kmem_cache_alloc_node+0x20c/0x380 [<00000000fb5f78d9>] kmalloc_trace+0x2f/0xb0 [<000000000e947e2a>] idt77252_init_one+0x2847/0x3c90 [idt77252] [<000000006efb048e>] local_pci_probe+0xeb/0x1a0 ... unreferenced object 0xffff888106500b00 (size 128): comm "modprobe", pid 1017, jiffies 4297787785 (age 67.152s) hex dump (first 32 bytes): 00 20 3d 01 80 88 ff ff 00 20 3d 01 80 88 ff ff . =...... =..... f0 23 3d 01 80 88 ff ff 00 20 3d 01 00 00 00 00 .#=...... =..... backtrace: [<00000000970ce626>] __kmem_cache_alloc_node+0x20c/0x380 [<00000000fb5f78d9>] kmalloc_trace+0x2f/0xb0 [<00000000f451c5be>] alloc_scq.constprop.0+0x4a/0x400 [idt77252] [<00000000e6313849>] idt77252_init_one+0x28cf/0x3c90 [idt77252] The root cause is traced to the vc_maps which alloced in open_card_oam() are not freed in close_card_oam(). The vc_maps are used to record open connections, so when close a vc_map in close_card_oam(), the memory should be freed. Moreover, the ubr0 is not closed when close a idt77252 device, leading to the memory leak of vc_map and scq_info. Fix them by adding kfree in close_card_oam() and implementing new close_card_ubr0() to close ubr0. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Francois Romieu <romieu@fr.zoreil.com> Link: https://lore.kernel.org/r/20230320143318.2644630-1-lizetao1@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21net: dsa: tag_brcm: legacy: fix daisy-chained switchesÁlvaro Fernández Rojas
When BCM63xx internal switches are connected to switches with a 4-byte Broadcom tag, it does not identify the packet as VLAN tagged, so it adds one based on its PVID (which is likely 0). Right now, the packet is received by the BCM63xx internal switch and the 6-byte tag is properly processed. The next step would to decode the corresponding 4-byte tag. However, the internal switch adds an invalid VLAN tag after the 6-byte tag and the 4-byte tag handling fails. In order to fix this we need to remove the invalid VLAN tag after the 6-byte tag before passing it to the 4-byte tag decoding. Fixes: 964dbf186eaa ("net: dsa: tag_brcm: add support for legacy tags") Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20230319095540.239064-1-noltari@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-21Merge tag 'vfio-v6.3-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fix from Alex Williamson: - Fix dirty size reporting for pre-copy in mlx5 variant driver (Yishai Hadas) * tag 'vfio-v6.3-rc4' of https://github.com/awilliam/linux-vfio: vfio/mlx5: Fix the report of dirty_bytes upon pre-copy
2023-03-21Merge tag 'nfsd-6.3-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix a crash during NFS READs from certain client implementations - Address a minor kbuild regression in v6.3 * tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't replace page in rq_pages if it's a continuation of last page NFS & NFSD: Update GSS dependencies
2023-03-21net/mlx5: E-Switch, Fix an Oops in error handling codeDan Carpenter
The error handling dereferences "vport". There is nothing we can do if it is an error pointer except returning the error code. Fixes: 133dcfc577ea ("net/mlx5: E-Switch, Alloc and free unique metadata for match") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5: Read the TC mapping of all priorities on ETS queryMaher Sanalla
When ETS configurations are queried by the user to get the mapping assignment between packet priority and traffic class, only priorities up to maximum TCs are queried from QTCT register in FW to retrieve their assigned TC, leaving the rest of the priorities mapped to the default TC #0 which might be misleading. Fix by querying the TC mapping of all priorities on each ETS query, regardless of the maximum number of TCs configured in FW. Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Overcome slow response for first macsec ASO WQEEmeel Hakim
First ASO WQE poll causes a cache miss in hardware hence the resut is delayed. It causes to the situation where such WQE is polled earlier than it is needed. Add logic to retry ASO CQ polling operation. Fixes: 739cfa34518e ("net/mlx5: Make ASO poll CQ usable in atomic context")  Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Initialize link speed to zeroRoy Novich
mlx5e_port_max_linkspeed does not guarantee value assignment for speed. Avoid cases where link_speed might be used uninitialized. In case mlx5e_port_max_linkspeed fails, a default link speed of 50000 will be used for the calculations. Fixes: 3f6d08d196b2 ("net/mlx5e: Add RSS support for hairpin") Signed-off-by: Roy Novich <royno@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5: Fix steering rules cleanupLama Kayal
vport's mc, uc and multicast rules are not deleted in teardown path when EEH happens. Since the vport's promisc settings(uc, mc and all) in firmware are reset after EEH, mlx5 driver will try to delete the above rules in the initialization path. This cause kernel crash because these software rules are no longer valid. Fix by nullifying these rules right after delete to avoid accessing any dangling pointers. Call Trace: __list_del_entry_valid+0xcc/0x100 (unreliable) tree_put_node+0xf4/0x1b0 [mlx5_core] tree_remove_node+0x30/0x70 [mlx5_core] mlx5_del_flow_rules+0x14c/0x1f0 [mlx5_core] esw_apply_vport_rx_mode+0x10c/0x200 [mlx5_core] esw_update_vport_rx_mode+0xb4/0x180 [mlx5_core] esw_vport_change_handle_locked+0x1ec/0x230 [mlx5_core] esw_enable_vport+0x130/0x260 [mlx5_core] mlx5_eswitch_enable_sriov+0x2a0/0x2f0 [mlx5_core] mlx5_device_enable_sriov+0x74/0x440 [mlx5_core] mlx5_load_one+0x114c/0x1550 [mlx5_core] mlx5_pci_resume+0x68/0xf0 [mlx5_core] eeh_report_resume+0x1a4/0x230 eeh_pe_dev_traverse+0x98/0x170 eeh_handle_normal_event+0x3e4/0x640 eeh_handle_event+0x4c/0x370 eeh_event_handler+0x14c/0x210 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0x84 Fixes: a35f71f27a61 ("net/mlx5: E-Switch, Implement promiscuous rx modes vf request handling") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Block entering switchdev mode with ns inconsistencyGavin Li
Upon entering switchdev mode, VF/SF representors are spawned in the devlink instance's net namespace, whereas the PF net device transforms into the uplink representor, remaining in the net namespace the PF net device was in. Therefore, if a PF net device's namespace is different from its parent devlink net namespace, entering switchdev mode can create an illegal situation where all representors sharing the same core device are NOT in the same net namespace. To avoid this issue, block entering switchdev mode for devices whose child netdev net namespace has diverged from the parent devlink's. Fixes: 7768d1971de6 ("net/mlx5: E-Switch, Add control for encapsulation") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21net/mlx5e: Set uplink rep as NETNS_LOCALGavin Li
Previously, NETNS_LOCAL was not set for uplink representors, inconsistent with VF representors, and allowed the uplink representor to be moved between net namespaces and separated from the VF representors it shares the core device with. Such usage would break the isolation model of namespaces, as devices in different namespaces would have access to shared memory. To solve this issue, set NETNS_LOCAL for uplink representors if eswitch is in switchdev mode. Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Gavi Teitz <gavi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-03-21bpf: Adjust insufficient default bpf_jit_limitDaniel Borkmann
We've seen recent AWS EKS (Kubernetes) user reports like the following: After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS clusters after a few days a number of the nodes have containers stuck in ContainerCreating state or liveness/readiness probes reporting the following error: Readiness probe errored: rpc error: code = Unknown desc = failed to exec in container: failed to start exec "4a11039f730203ffc003b7[...]": OCI runtime exec failed: exec failed: unable to start container process: unable to init seccomp: error loading seccomp filter into kernel: error loading seccomp filter: errno 524: unknown However, we had not been seeing this issue on previous AMIs and it only started to occur on v20230217 (following the upgrade from kernel 5.4 to 5.10) with no other changes to the underlying cluster or workloads. We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528) which helped to immediately allow containers to be created and probes to execute but after approximately a day the issue returned and the value returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}' was steadily increasing. I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem their sizes passed in as well as bpf_jit_current under tcpdump BPF filter, seccomp BPF and native (e)BPF programs, and the behavior all looks sane and expected, that is nothing "leaking" from an upstream perspective. The bpf_jit_limit knob was originally added in order to avoid a situation where unprivileged applications loading BPF programs (e.g. seccomp BPF policies) consuming all the module memory space via BPF JIT such that loading of kernel modules would be prevented. The default limit was defined back in 2018 and while good enough back then, we are generally seeing far more BPF consumers today. Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the module memory space to better reflect today's needs and avoid more users running into potentially hard to debug issues. Fixes: fdadd04931c2 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K") Reported-by: Stephen Haynes <sh@synk.net> Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://github.com/awslabs/amazon-eks-ami/issues/1179 Link: https://github.com/awslabs/amazon-eks-ami/issues/1219 Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-21Merge tag 'keys-fixes-20230321' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull keyrings fixes from David Howells: - Fix request_key() so that it doesn't cache a looked up key on the current thread if that thread is a kernel thread. The cache is cleared during notify_resume - but that doesn't happen in kernel threads. This is causing cifs DNS keys to be un-invalidateable. - Fix a wrapper check in verify_pefile() to not round up the length. - Change asymmetric_keys code to log errors to make it easier for users to work out why failures occurred. * tag 'keys-fixes-20230321' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: asymmetric_keys: log on fatal failures in PE/pkcs7 verify_pefile: relax wrapper length check keys: Do not cache key in task struct if key is requested from kernel thread
2023-03-21i40e: fix flow director packet filter programmingRadoslaw Tyl
Initialize to zero structures to build a valid Tx Packet used for the filter programming. Fixes: a9219b332f52 ("i40e: VLAN field for flow director") Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21iavf: fix hang on reboot with iceStefan Assmann
When a system with E810 with existing VFs gets rebooted the following hang may be observed. Pid 1 is hung in iavf_remove(), part of a network driver: PID: 1 TASK: ffff965400e5a340 CPU: 24 COMMAND: "systemd-shutdow" #0 [ffffaad04005fa50] __schedule at ffffffff8b3239cb #1 [ffffaad04005fae8] schedule at ffffffff8b323e2d #2 [ffffaad04005fb00] schedule_hrtimeout_range_clock at ffffffff8b32cebc #3 [ffffaad04005fb80] usleep_range_state at ffffffff8b32c930 #4 [ffffaad04005fbb0] iavf_remove at ffffffffc12b9b4c [iavf] #5 [ffffaad04005fbf0] pci_device_remove at ffffffff8add7513 #6 [ffffaad04005fc10] device_release_driver_internal at ffffffff8af08baa #7 [ffffaad04005fc40] pci_stop_bus_device at ffffffff8adcc5fc #8 [ffffaad04005fc60] pci_stop_and_remove_bus_device at ffffffff8adcc81e #9 [ffffaad04005fc70] pci_iov_remove_virtfn at ffffffff8adf9429 #10 [ffffaad04005fca8] sriov_disable at ffffffff8adf98e4 #11 [ffffaad04005fcc8] ice_free_vfs at ffffffffc04bb2c8 [ice] #12 [ffffaad04005fd10] ice_remove at ffffffffc04778fe [ice] #13 [ffffaad04005fd38] ice_shutdown at ffffffffc0477946 [ice] #14 [ffffaad04005fd50] pci_device_shutdown at ffffffff8add58f1 #15 [ffffaad04005fd70] device_shutdown at ffffffff8af05386 #16 [ffffaad04005fd98] kernel_restart at ffffffff8a92a870 #17 [ffffaad04005fda8] __do_sys_reboot at ffffffff8a92abd6 #18 [ffffaad04005fee0] do_syscall_64 at ffffffff8b317159 #19 [ffffaad04005ff08] __context_tracking_enter at ffffffff8b31b6fc #20 [ffffaad04005ff18] syscall_exit_to_user_mode at ffffffff8b31b50d #21 [ffffaad04005ff28] do_syscall_64 at ffffffff8b317169 #22 [ffffaad04005ff50] entry_SYSCALL_64_after_hwframe at ffffffff8b40009b RIP: 00007f1baa5c13d7 RSP: 00007fffbcc55a98 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1baa5c13d7 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead RBP: 00007fffbcc55ca0 R8: 0000000000000000 R9: 00007fffbcc54e90 R10: 00007fffbcc55050 R11: 0000000000000202 R12: 0000000000000005 R13: 0000000000000000 R14: 00007fffbcc55af0 R15: 0000000000000000 ORIG_RAX: 00000000000000a9 CS: 0033 SS: 002b During reboot all drivers PM shutdown callbacks are invoked. In iavf_shutdown() the adapter state is changed to __IAVF_REMOVE. In ice_shutdown() the call chain above is executed, which at some point calls iavf_remove(). However iavf_remove() expects the VF to be in one of the states __IAVF_RUNNING, __IAVF_DOWN or __IAVF_INIT_FAILED. If that's not the case it sleeps forever. So if iavf_shutdown() gets invoked before iavf_remove() the system will hang indefinitely because the adapter is already in state __IAVF_REMOVE. Fix this by returning from iavf_remove() if the state is __IAVF_REMOVE, as we already went through iavf_shutdown(). Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove") Fixes: a8417330f8a5 ("iavf: Fix race condition between iavf_shutdown and iavf_remove") Reported-by: Marius Cornea <mcornea@redhat.com> Signed-off-by: Stefan Assmann <sassmann@kpanic.de> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: remove filters only if VSI is deletedMichal Swiatkowski
Filters shouldn't be removed in VSI rebuild path. Removing them on PF VSI results in no rule for PF MAC after changing for example queues amount. Remove all filters only in the VSI remove flow. As unload should also cause the filter to be removed introduce, a new function ice_stop_eth(). It will unroll ice_start_eth(), so remove filters and close VSI. Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: check if VF exists before mode checkMichal Swiatkowski
Setting trust on VF should return EINVAL when there is no VF. Move checking for switchdev mode after checking if VF exists. Fixes: c54d209c78b8 ("ice: Wait for VF to be reset/ready before configuration") Signed-off-by: Michal Swiatkowski <michal.swiatkowski@intel.com> Signed-off-by: Kalyan Kodamagula <kalyan.kodamagula@intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21ice: fix rx buffers handling for flow director packetsPiotr Raczynski
Adding flow director filters stopped working correctly after commit 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side"). As a result, only first flow director filter can be added, adding next filter leads to NULL pointer dereference attached below. Rx buffer handling and reallocation logic has been optimized, however flow director specific traffic was not accounted for. As a result driver handled those packets incorrectly since new logic was based on ice_rx_ring::first_desc which was not set in this case. Fix this by setting struct ice_rx_ring::first_desc to next_to_clean for flow director received packets. [ 438.544867] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 438.551840] #PF: supervisor read access in kernel mode [ 438.556978] #PF: error_code(0x0000) - not-present page [ 438.562115] PGD 7c953b2067 P4D 0 [ 438.565436] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 438.569794] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.2.0-net-bug #1 [ 438.577531] Hardware name: Intel Corporation M50CYP2SBSTD/M50CYP2SBSTD, BIOS SE5C620.86B.01.01.0005.2202160810 02/16/2022 [ 438.588470] RIP: 0010:ice_clean_rx_irq+0x2b9/0xf20 [ice] [ 438.593860] Code: 45 89 f7 e9 ac 00 00 00 8b 4d 78 41 31 4e 10 41 09 d5 4d 85 f6 0f 84 82 00 00 00 49 8b 4e 08 41 8b 76 1c 65 8b 3d 47 36 4a 3f <48> 8b 11 48 c1 ea 36 39 d7 0f 85 a6 00 00 00 f6 41 08 02 0f 85 9c [ 438.612605] RSP: 0018:ff8c732640003ec8 EFLAGS: 00010082 [ 438.617831] RAX: 0000000000000800 RBX: 00000000000007ff RCX: 0000000000000000 [ 438.624957] RDX: 0000000000000800 RSI: 0000000000000000 RDI: 0000000000000000 [ 438.632089] RBP: ff4ed275a2158200 R08: 00000000ffffffff R09: 0000000000000020 [ 438.639222] R10: 0000000000000000 R11: 0000000000000020 R12: 0000000000001000 [ 438.646356] R13: 0000000000000000 R14: ff4ed275d0daffe0 R15: 0000000000000000 [ 438.653485] FS: 0000000000000000(0000) GS:ff4ed2738fa00000(0000) knlGS:0000000000000000 [ 438.661563] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 438.667310] CR2: 0000000000000000 CR3: 0000007c9f0d6006 CR4: 0000000000771ef0 [ 438.674444] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 438.681573] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 438.688697] PKRU: 55555554 [ 438.691404] Call Trace: [ 438.693857] <IRQ> [ 438.695877] ? profile_tick+0x17/0x80 [ 438.699542] ice_msix_clean_ctrl_vsi+0x24/0x50 [ice] [ 438.702571] ice 0000:b1:00.0: VF 1: ctrl_vsi irq timeout [ 438.704542] __handle_irq_event_percpu+0x43/0x1a0 [ 438.704549] handle_irq_event+0x34/0x70 [ 438.704554] handle_edge_irq+0x9f/0x240 [ 438.709901] iavf 0000:b1:01.1: Failed to add Flow Director filter with status: 6 [ 438.714571] __common_interrupt+0x63/0x100 [ 438.714580] common_interrupt+0xb4/0xd0 [ 438.718424] iavf 0000:b1:01.1: Rule ID: 127 dst_ip: 0.0.0.0 src_ip 0.0.0.0 UDP: dst_port 4 src_port 0 [ 438.722255] </IRQ> [ 438.722257] <TASK> [ 438.722257] asm_common_interrupt+0x22/0x40 [ 438.722262] RIP: 0010:cpuidle_enter_state+0xc8/0x430 [ 438.722267] Code: 6e e9 25 ff e8 f9 ef ff ff 8b 53 04 49 89 c5 0f 1f 44 00 00 31 ff e8 d7 f1 24 ff 45 84 ff 0f 85 57 02 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 85 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 438.722269] RSP: 0018:ffffffff86003e50 EFLAGS: 00000246 [ 438.784108] RAX: ff4ed2738fa00000 RBX: ffbe72a64fc01020 RCX: 0000000000000000 [ 438.791234] RDX: 0000000000000000 RSI: ffffffff858d84de RDI: ffffffff85893641 [ 438.798365] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000003158af9d [ 438.805490] R10: 0000000000000008 R11: 0000000000000354 R12: ffffffff862365a0 [ 438.812622] R13: 000000661b472a87 R14: 0000000000000002 R15: 0000000000000000 [ 438.819757] cpuidle_enter+0x29/0x40 [ 438.823333] do_idle+0x1b6/0x230 [ 438.826566] cpu_startup_entry+0x19/0x20 [ 438.830492] rest_init+0xcb/0xd0 [ 438.833717] arch_call_rest_init+0xa/0x30 [ 438.837731] start_kernel+0x776/0xb70 [ 438.841396] secondary_startup_64_no_verify+0xe5/0xeb [ 438.846449] </TASK> Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-03-21asymmetric_keys: log on fatal failures in PE/pkcs7Robbie Harwood
These particular errors can be encountered while trying to kexec when secureboot lockdown is in place. Without this change, even with a signed debug build, one still needs to reboot the machine to add the appropriate dyndbg parameters (since lockdown blocks debugfs). Accordingly, upgrade all pr_debug() before fatal error into pr_warn(). Signed-off-by: Robbie Harwood <rharwood@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jarkko Sakkinen <jarkko@kernel.org> cc: Eric Biederman <ebiederm@xmission.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org cc: kexec@lists.infradead.org Link: https://lore.kernel.org/r/20230220171254.592347-3-rharwood@redhat.com/ # v2
2023-03-21verify_pefile: relax wrapper length checkRobbie Harwood
The PE Format Specification (section "The Attribute Certificate Table (Image Only)") states that `dwLength` is to be rounded up to 8-byte alignment when used for traversal. Therefore, the field is not required to be an 8-byte multiple in the first place. Accordingly, pesign has not performed this alignment since version 0.110. This causes kexec failure on pesign'd binaries with "PEFILE: Signature wrapper len wrong". Update the comment and relax the check. Signed-off-by: Robbie Harwood <rharwood@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jarkko Sakkinen <jarkko@kernel.org> cc: Eric Biederman <ebiederm@xmission.com> cc: Herbert Xu <herbert@gondor.apana.org.au> cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org cc: kexec@lists.infradead.org Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#the-attribute-certificate-table-image-only Link: https://github.com/rhboot/pesign Link: https://lore.kernel.org/r/20230220171254.592347-2-rharwood@redhat.com/ # v2
2023-03-21keys: Do not cache key in task struct if key is requested from kernel threadDavid Howells
The key which gets cached in task structure from a kernel thread does not get invalidated even after expiry. Due to which, a new key request from kernel thread will be served with the cached key if it's present in task struct irrespective of the key validity. The change is to not cache key in task_struct when key requested from kernel thread so that kernel thread gets a valid key on every key request. The problem has been seen with the cifs module doing DNS lookups from a kernel thread and the results getting pinned by being attached to that kernel thread's cache - and thus not something that can be easily got rid of. The cache would ordinarily be cleared by notify-resume, but kernel threads don't do that. This isn't seen with AFS because AFS is doing request_key() within the kernel half of a user thread - which will do notify-resume. Fixes: 7743c48e54ee ("keys: Cache result of request_key*() temporarily in task_struct") Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> cc: Shyam Prasad N <nspmangalore@gmail.com> cc: Steve French <smfrench@gmail.com> cc: keyrings@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/CAGypqWw951d=zYRbdgNR4snUDvJhWL=q3=WOyh7HhSJupjz2vA@mail.gmail.com/
2023-03-22bootconfig: Fix testcase to increase max nodeMasami Hiramatsu (Google)
Since commit 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support") increased the max number of bootconfig node to 8192, the bootconfig testcase of the max number of nodes fails. To fix this issue, we can not simply increase the number in the test script because the test bootconfig file becomes too big (>32KB). To fix that, we can use a combination of three alphabets (26^3 = 17576). But with that, we can not express the 8193 (just one exceed from the limitation) because it also exceeds the max size of bootconfig. So, the first 26 nodes will just use one alphabet. With this fix, test-bootconfig.sh passes all tests. Link: https://lore.kernel.org/all/167888844790.791176.670805252426835131.stgit@devnote2/ Reported-by: Heinz Wiesinger <pprkut@slackware.com> Link: https://lore.kernel.org/all/2463802.XAFRqVoOGU@amaterasu.liwjatan.org Fixes: 6c40624930c5 ("bootconfig: Increase max nodes of bootconfig from 1024 to 8192 for DCC support") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-03-20octeontx2-vf: Add missing free for alloc_percpuJiasheng Jiang
Add the free_percpu for the allocated "vf->hw.lmt_info" in order to avoid memory leak, same as the "pf->hw.lmt_info" in `drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c`. Fixes: 5c0512072f65 ("octeontx2-pf: cn10k: Use runtime allocated LMTLINE region") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Acked-by: Geethasowjanya Akula <gakula@marvell.com> Link: https://lore.kernel.org/r/20230317064337.18198-1-jiasheng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-20Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linuxLinus Torvalds
Pull fsverity fixes from Eric Biggers: "Fix two significant performance issues with fsverity" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY fsverity: Remove WQ_UNBOUND from fsverity read workqueue
2023-03-20Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linuxLinus Torvalds
Pull fscrypt fix from Eric Biggers: "Fix a bug where when a filesystem was being unmounted, the fscrypt keyring was destroyed before inodes have been released by the Landlock LSM. This bug was found by syzbot" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref() fscrypt: improve fscrypt_destroy_keyring() documentation fscrypt: destroy keyring after security_sb_delete()
2023-03-21zonefs: Fix error message in zonefs_file_dio_append()Damien Le Moal
Since the expected write location in a sequential file is always at the end of the file (append write), when an invalid write append location is detected in zonefs_file_dio_append(), print the invalid written location instead of the expected write location. Fixes: a608da3bd730 ("zonefs: Detect append writes at invalid locations") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-21zonefs: Prevent uninitialized symbol 'size' warningDamien Le Moal
In zonefs_file_dio_append(), initialize the variable size to 0 to prevent compilation and static code analizers warning such as: New smatch warnings: fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized symbol 'size'. The warning is a false positive as size is never actually used uninitialized. No functional change. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202303191227.GL8Dprbi-lkp@intel.com/ Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-20gpu: host1x: fix uninitialized variable useArnd Bergmann
The error handling for platform_get_irq() failing no longer works after a recent change, clang now points this out with a warning: drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized] if (syncpt_irq < 0) ^~~~~~~~~~ Fix this by removing the variable and checking the correct error status. Fixes: 625d4ffb438c ("gpu: host1x: Rewrite syncpoint interrupt handling") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-20Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: - Fix /proc/PID/io read_bytes accounting - Fix setting NLM file_lock start and end during decoding testargs - Fix timing for setting access cache timestamps * tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Correct timing for assigning access cache timestamp lockd: set file_lock start and end when decoding nlm4 testargs NFS: Fix /proc/PID/io read_bytes for buffered reads
2023-03-20Merge branch 'ps3_gelic_net-fixes'David S. Miller
Geoff Levand says: ==================== net/ps3_gelic_net: DMA related fixes v9: Make rx_skb_size local to gelic_descr_prepare_rx. v8: Add more cpu_to_be32 calls. v7: Remove all cleanups, sync to spider net. v6: Reworked and cleaned up patches. v5: Some additional patch cleanups. v4: More patch cleanups. v3: Cleaned up patches as requested. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net/ps3_gelic_net: Use dma_mapping_errorGeoff Levand
The current Gelic Etherenet driver was checking the return value of its dma_map_single call, and not using the dma_mapping_error() routine. Fixes runtime problems like these: DMA-API: ps3_gelic_driver sb_05: device driver failed to check map error WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:1027 .check_unmap+0x888/0x8dc Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net/ps3_gelic_net: Fix RX sk_buff lengthGeoff Levand
The Gelic Ethernet device needs to have the RX sk_buffs aligned to GELIC_NET_RXBUF_ALIGN, and also the length of the RX sk_buffs must be a multiple of GELIC_NET_RXBUF_ALIGN. The current Gelic Ethernet driver was not allocating sk_buffs large enough to allow for this alignment. Also, correct the maximum and minimum MTU sizes, and add a new preprocessor macro for the maximum frame size, GELIC_NET_MAX_FRAME. Fixes various randomly occurring runtime network errors. Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3") Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20usb: plusb: remove unused pl_clear_QuickLink_features functionTom Rix
clang with W=1 reports drivers/net/usb/plusb.c:65:1: error: unused function 'pl_clear_QuickLink_features' [-Werror,-Wunused-function] pl_clear_QuickLink_features(struct usbnet *dev, int val) ^ This static function is not used, so remove it. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: usb: lan78xx: Limit packet length to skb->lenSzymon Heidrich
Packet length retrieved from descriptor may be larger than the actual socket buffer length. In such case the cloned skb passed up the network stack will leak kernel memory contents. Additionally prevent integer underflow when size is less than ETH_FCS_LEN. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Szymon Heidrich <szymon.heidrich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: qcom/emac: Fix use after free bug in emac_remove due to race conditionZheng Wang
In emac_probe, &adpt->work_thread is bound with emac_work_thread. Then it will be started by timeout handler emac_tx_timeout or a IRQ handler emac_isr. If we remove the driver which will call emac_remove to make cleanup, there may be a unfinished work. The possible sequence is as follows: Fix it by finishing the work before cleanup in the emac_remove and disable timeout response. CPU0 CPU1 |emac_work_thread emac_remove | free_netdev | kfree(netdev); | |emac_reinit_locked |emac_mac_down |//use netdev Fixes: b9b17debc69d ("net: emac: emac gigabit ethernet controller driver") Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-20net: dsa: report rx_bytes unadjusted for ETH_HLENVladimir Oltean
We collect the software statistics counters for RX bytes (reported to /proc/net/dev and to ethtool -S $dev | grep 'rx_bytes: ") at a time when skb->len has already been adjusted by the eth_type_trans() -> skb_pull_inline(skb, ETH_HLEN) call to exclude the L2 header. This means that when connecting 2 DSA interfaces back to back and sending 1 packet with length 100, the sending interface will report tx_bytes as incrementing by 100, and the receiving interface will report rx_bytes as incrementing by 86. Since accounting for that in scripts is quirky and is something that would be DSA-specific behavior (requiring users to know that they are running on a DSA interface in the first place), the proposal is that we treat it as a bug and fix it. This design bug has always existed in DSA, according to my analysis: commit 91da11f870f0 ("net: Distributed Switch Architecture protocol support") also updates skb->dev->stats.rx_bytes += skb->len after the eth_type_trans() call. Technically, prior to Florian's commit a86d8becc3f0 ("net: dsa: Factor bottom tag receive functions"), each and every vendor-specific tagging protocol driver open-coded the same bug, until the buggy code was consolidated into something resembling what can be seen now. So each and every driver should have its own Fixes: tag, because of their different histories until the convergence point. I'm not going to do that, for the sake of simplicity, but just blame the oldest appearance of buggy code. There are 2 ways to fix the problem. One is the obvious way, and the other is how I ended up doing it. Obvious would have been to move dev_sw_netstats_rx_add() one line above eth_type_trans(), and below skb_push(skb, ETH_HLEN). But DSA processing is not as simple as that. We count the bytes after removing everything DSA-related from the packet, to emulate what the packet's length was, on the wire, when the user port received it. When eth_type_trans() executes, dsa_untag_bridge_pvid() has not run yet, so in case the switch driver requests this behavior - commit 412a1526d067 ("net: dsa: untag the bridge pvid from rx skbs") has the details - the obvious variant of the fix wouldn't have worked, because the positioning there would have also counted the not-yet-stripped VLAN header length, something which is absent from the packet as seen on the wire (there it may be untagged, whereas software will see it as PVID-tagged). Fixes: f613ed665bb3 ("net: dsa: Add support for 64-bit statistics") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-19Linux 6.3-rc3Linus Torvalds
2023-03-19Merge tag 'trace-v6.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix setting affinity of hwlat threads in containers Using sched_set_affinity() has unwanted side effects when being called within a container. Use set_cpus_allowed_ptr() instead - Fix per cpu thread management of the hwlat tracer: - Do not start per_cpu threads if one is already running for the CPU - When starting per_cpu threads, do not clear the kthread variable as it may already be set to running per cpu threads - Fix return value for test_gen_kprobe_cmd() On error the return value was overwritten by being set to the result of the call from kprobe_event_delete(), which would likely succeed, and thus have the function return success - Fix splice() reads from the trace file that was broken by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") - Remove obsolete and confusing comment in ring_buffer.c The original design of the ring buffer used struct page flags for tricks to optimize, which was shortly removed due to them being tricks. But a comment for those tricks remained - Set local functions and variables to static * tag 'trace-v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptr ring-buffer: remove obsolete comment for free_buffer_page() tracing: Make splice_read available again ftrace: Set direct_ops storage-class-specifier to static trace/hwlat: Do not start per-cpu thread if it is already running trace/hwlat: Do not wipe the contents of per-cpu thread data tracing/osnoise: set several trace_osnoise.c variables storage-class-specifier to static tracing: Fix wrong return in kprobe_event_gen_test.c
2023-03-19tracing/hwlat: Replace sched_setaffinity with set_cpus_allowed_ptrCosta Shulyupin
There is a problem with the behavior of hwlat in a container, resulting in incorrect output. A warning message is generated: "cpumask changed while in round-robin mode, switching to mode none", and the tracing_cpumask is ignored. This issue arises because the kernel thread, hwlatd, is not a part of the container, and the function sched_setaffinity is unable to locate it using its PID. Additionally, the task_struct of hwlatd is already known. Ultimately, the function set_cpus_allowed_ptr achieves the same outcome as sched_setaffinity, but employs task_struct instead of PID. Test case: # cd /sys/kernel/tracing # echo 0 > tracing_on # echo round-robin > hwlat_detector/mode # echo hwlat > current_tracer # unshare --fork --pid bash -c 'echo 1 > tracing_on' # dmesg -c Actual behavior: [573502.809060] hwlat_detector: cpumask changed while in round-robin mode, switching to mode none Link: https://lore.kernel.org/linux-trace-kernel/20230316144535.1004952-1-costa.shul@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Fixes: 0330f7aa8ee63 ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs") Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>