linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-08-15	Revert "serial: 8250_omap: Set the console genpd always on if no console ↵	Griffin Kroah-Hartman
	suspend" This reverts commit 68e6939ea9ec3d6579eadeab16060339cdeaf940. Kevin reported that this causes a crash during suspend on platforms that dont use PM domains. Link: https://lore.kernel.org/r/7ha5hgpchq.fsf@baylibre.com Cc: Thomas Richard <thomas.richard@bootlin.com> Fixes: 68e6939ea9ec ("serial: 8250_omap: Set the console genpd always on if no console suspend") Cc: stable <stable@kernel.org> Reported-by: Kevin Hilman <khilman@kernel.org> Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com> Link: https://lore.kernel.org/r/20240814111747.82371-1-griffin@kroah.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-14	Merge tag 'wireless-2024-08-14' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Kalle Valo says: ==================== wireless fixes for v6.11 We have few fixes to drivers. The most important here is a fix for iwlwifi which caused major slowdowns for several users. * tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: iwlwifi: correctly lookup DMA address in SG table wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion wifi: rtlwifi: rtl8192du: Initialise value32 in _rtl92du_init_queue_reserved_page wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850 ==================== Link: https://patch.msgid.link/20240814171606.E14A0C116B1@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	selftest: af_unix: Fix kselftest compilation warnings	Abhinav Jain
	Change expected_buf from (const void ) to (const char ) in function __recvpair(). This change fixes the below warnings during test compilation: ``` In file included from msg_oob.c:14: msg_oob.c: In function ‘__recvpair’: ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char ’,but argument 6 has type ‘const void ’ [-Wformat=] ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’ msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’ ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char ’,but argument 6 has type ‘const void ’ [-Wformat=] ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’ msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’ ``` Fixes: d098d77232c3 ("selftest: af_unix: Add msg_oob.c.") Signed-off-by: Abhinav Jain <jain.abhinav177@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20240814080743.1156166-1-jain.abhinav177@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	Merge branch 'uapi-net-sched-cxgb4-fix-wflex-array-member-not-at-end-warning'	Jakub Kicinski
	Gustavo A. R. Silva says: ==================== UAPI: net/sched - cxgb4: Fix -Wflex-array-member-not-at-end warning Small patch series aimed at fixing a -Wflex-array-member-not-at-end warning by creating a new tagged struct within a flexible structure. We then use this new struct type to fix a problematic middle-flex-array declaration in a composite struct. ==================== Link: https://patch.msgid.link/cover.1723586870.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	cxgb4: Avoid -Wflex-array-member-not-at-end warning	Gustavo A. R. Silva
	-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Change the type of the middle struct member currently causing trouble from `struct tc_u32_sel` to `struct tc_u32_sel_hdr`. Fix the following warning: drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h:245:27: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/97388e8a7990975aa56cf0ada211764c735c3432.1723586870.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	UAPI: net/sched: Use __struct_group() in flex struct tc_u32_sel	Gustavo A. R. Silva
	Use the `__struct_group()` helper to create a new tagged `struct tc_u32_sel_hdr`. This structure groups together all the members of the flexible `struct tc_u32_sel` except the flexible array. As a result, the array is effectively separated from the rest of the members without modifying the memory layout of the flexible structure. This new tagged struct will be used to fix problematic declarations of middle-flex-arrays in composite structs[1]. [1] https://git.kernel.org/linus/d88cabfd9abc Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/e59fe833564ddc5b2cc83056a4c504be887d6193.1723586870.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	Merge branch 'bnxt_en-address-string-truncation'	Jakub Kicinski
	Simon Horman says: ==================== bnxt_en: address string truncation This series addresses several string truncation issues that are flagged by gcc-14. I do not have any reason to believe these are bugs, so I am targeting this at net-next and have not provided Fixes tags. v1: https://lore.kernel.org/r/20240705-bnxt-str-v1-0-bafc769ed89e@kernel.org ==================== Link: https://patch.msgid.link/20240813-bnxt-str-v2-0-872050a157e7@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	bnxt_en: avoid truncation of per rx run debugfs filename	Simon Horman
	Although it seems unlikely in practice - there would need to be rx ring indexes greater than 10^10 - it is theoretically possible for the filename of per rx ring debugfs files to be truncated. This is because although a 16 byte buffer is provided, the length of the filename is restricted to 10 bytes. Remove this restriction and allow the entire buffer to be used. Also reduce the buffer to 12 bytes, which is sufficient. Given that the range of rx ring indexes likely much smaller than the maximum range of a 32-bit signed integer, a smaller buffer could be used, with some further changes. But this change seems simple, robust, and has minimal stack overhead. Flagged by gcc-14: .../bnxt_debugfs.c: In function 'bnxt_debug_dev_init': drivers/net/ethernet/broadcom/bnxt/bnxt_debugfs.c:69:30: warning: '%d' directive output may be truncated writing between 1 and 11 bytes into a region of size 10 [-Wformat-truncation=] 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~ In function 'debugfs_dim_ring_init', inlined from 'bnxt_debug_dev_init' at .../bnxt_debugfs.c:87:4: .../bnxt_debugfs.c:69:29: note: directive argument in the range [-2147483643, 2147483646] 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~~~ .../bnxt_debugfs.c:69:9: note: 'snprintf' output between 2 and 12 bytes into a destination of size 10 69 \| snprintf(qname, 10, "%d", ring_idx); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Compile tested only Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240813-bnxt-str-v2-2-872050a157e7@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	bnxt_en: Extend maximum length of version string by 1 byte	Simon Horman
	This corrects an out-by-one error in the maximum length of the package version string. The size argument of snprintf includes space for the trailing '\0' byte, so there is no need to allow extra space for it by reducing the value of the size argument by 1. Found by inspection. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20240813-bnxt-str-v2-1-872050a157e7@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-14	Merge tag 'for-6.11-rc3-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - extend tree-checker verification of directory item type - fix regression in page/folio and extent state tracking in xarray, the dirty status can get out of sync and can cause problems e.g. a hang - in send, detect last extent and allow to clone it instead of sending it as write, reduces amount of data transferred in the stream - fix checking extent references when cleaning deleted subvolumes - fix one more case in the extent map shrinker, let it run only in the kswapd context so it does not cause latency spikes during other operations * tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix invalid mapping of extent xarray state btrfs: send: allow cloning non-aligned extent if it ends at i_size btrfs: only run the extent map shrinker from kswapd tasks btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type btrfs: check delayed refs when we're checking if a ref exists
2024-08-15	i2c: tegra: Do not mark ACPI devices as irq safe	Breno Leitao
	On ACPI machines, the tegra i2c module encounters an issue due to a mutex being called inside a spinlock. This leads to the following bug: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 ... Call trace: __might_sleep __mutex_lock_common mutex_lock_nested acpi_subsys_runtime_resume rpm_resume tegra_i2c_xfer The problem arises because during __pm_runtime_resume(), the spinlock &dev->power.lock is acquired before rpm_resume() is called. Later, rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on mutexes, triggering the error. To address this issue, devices on ACPI are now marked as not IRQ-safe, considering the dependency of acpi_subsys_runtime_resume() on mutexes. Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support") Cc: <stable@vger.kernel.org> # v5.17+ Co-developed-by: Michael van der Westhuizen <rmikey@meta.com> Signed-off-by: Michael van der Westhuizen <rmikey@meta.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Andy Shevchenko <andy@kernel.org> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-08-14	netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests	Phil Sutter
	Objects' dump callbacks are not concurrency-safe per-se with reset bit set. If two CPUs perform a reset at the same time, at least counter and quota objects suffer from value underrun. Prevent this by introducing dedicated locking callbacks for nfnetlink and the asynchronous dump handling to serialize access. Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	netfilter: nf_tables: Introduce nf_tables_getobj_single	Phil Sutter
	Outsource the reply skb preparation for non-dump getrule requests into a distinct function. Prep work for object reset locking. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	netfilter: nf_tables: Audit log dump reset after the fact	Phil Sutter
	In theory, dumpreset may fail and invalidate the preceeding log message. Fix this and use the occasion to prepare for object reset locking, which benefits from a few unrelated changes: * Add an early call to nfnetlink_unicast if not resetting which effectively skips the audit logging but also unindents it. * Extract the table's name from the netlink attribute (which is verified via earlier table lookup) to not rely upon validity of the looked up table pointer. * Do not use local variable family, it will vanish. Fixes: 8e6cf365e1d5 ("audit: log nftables configuration change events") Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	selftests: netfilter: add test for br_netfilter+conntrack+queue combination	Florian Westphal
	Trigger cloned skbs leaving softirq protection. This triggers splat without the preceeding change ("netfilter: nf_queue: drop packets with cloned unconfirmed conntracks"): WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm.. because local delivery and forwarding will race for confirmation. Based on a reproducer script from Yi Chen. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	netfilter: nf_queue: drop packets with cloned unconfirmed conntracks	Florian Westphal
	Conntrack assumes an unconfirmed entry (not yet committed to global hash table) has a refcount of 1 and is not visible to other cores. With multicast forwarding this assumption breaks down because such skbs get cloned after being picked up, i.e. ct->use refcount is > 1. Likewise, bridge netfilter will clone broad/mutlicast frames and all frames in case they need to be flood-forwarded during learning phase. For ip multicast forwarding or plain bridge flood-forward this will "work" because packets don't leave softirq and are implicitly serialized. With nfqueue this no longer holds true, the packets get queued and can be reinjected in arbitrary ways. Disable this feature, I see no other solution. After this patch, nfqueue cannot queue packets except the last multicast/broadcast packet. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	netfilter: flowtable: initialise extack before use	Donald Hunter
	Fix missing initialisation of extack in flow offload. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	netfilter: nfnetlink: Initialise extack before use in ACKs	Donald Hunter
	Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END. Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages") Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "s390: - Fix failure to start guests with kvm.use_gisa=0 - Panic if (un)share fails to maintain security. ARM: - Use kvfree() for the kvmalloc'd nested MMUs array - Set of fixes to address warnings in W=1 builds - Make KVM depend on assembler support for ARMv8.4 - Fix for vgic-debug interface for VMs without LPIs - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest - Minor code / comment cleanups for configuring PAuth traps - Take kvm->arch.config_lock to prevent destruction / initialization race for a vCPU's CPUIF which may lead to a UAF x86: - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) - Fix smatch issues - Small cleanups - Make x2APIC ID 100% readonly - Fix typo in uapi constant Generic: - Use synchronize_srcu_expedited() on irqfd shutdown" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) KVM: eventfd: Use synchronize_srcu_expedited() on shutdown KVM: selftests: Add a testcase to verify x2APIC is fully readonly KVM: x86: Make x2APIC ID 100% readonly KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id()) KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page() KVM: SVM: Fix an error code in sev_gmem_post_populate() KVM: SVM: Fix uninitialized variable bug KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list KVM: arm64: Tidying up PAuth code in KVM KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain s390/uv: Panic for set and remove shared access UVC errors KVM: s390: fix validity interception issue when gisa is switched off docs: KVM: Fix register ID of SPSR_FIQ KVM: arm64: vgic: fix unexpected unlock sparse warnings KVM: arm64: fix kdoc warnings in W=1 builds KVM: arm64: fix override-init warnings in W=1 builds ...
2024-08-14	RISC-V: hwprobe: Add SCALAR to misaligned perf defines	Evan Green
	In preparation for misaligned vector performance hwprobe keys, rename the hwprobe key values associated with misaligned scalar accesses to include the term SCALAR. Leave the old defines in place to maintain source compatibility. This change is intended to be a functional no-op. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20240809214444.3257596-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14	RISC-V: hwprobe: Add MISALIGNED_PERF key	Evan Green
	RISCV_HWPROBE_KEY_CPUPERF_0 was mistakenly flagged as a bitmask in hwprobe_key_is_bitmask(), when in reality it was an enum value. This causes problems when used in conjunction with RISCV_HWPROBE_WHICH_CPUS, since SLOW, FAST, and EMULATED have values whose bits overlap with each other. If the caller asked for the set of CPUs that was SLOW or EMULATED, the returned set would also include CPUs that were FAST. Introduce a new hwprobe key, RISCV_HWPROBE_KEY_MISALIGNED_PERF, which returns the same values in response to a direct query (with no flags), but is properly handled as an enumerated value. As a result, SLOW, FAST, and EMULATED are all correctly treated as distinct values under the new key when queried with the WHICH_CPUS flag. Leave the old key in place to avoid disturbing applications which may have already come to rely on the key, with or without its broken behavior with respect to the WHICH_CPUS flag. Fixes: e178bf146e4b ("RISC-V: hwprobe: Introduce which-cpus flag") Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240809214444.3257596-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14	RISC-V: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE	Haibo Xu
	Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Fixes: eabd9db64ea8 ("ACPI: RISCV: Add NUMA support based on SRAT and SLIT") Reported-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/0d362a8ae50558b95685da4c821b2ae9e8cf78be.1722828421.git.haibo1.xu@intel.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14	riscv: change XIP's kernel_map.size to be size of the entire kernel	Nam Cao
	With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel. More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c). Change XIP's kernel_map.size to be the size of the entire kernel. Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> # v6.1+ Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14	riscv: entry: always initialize regs->a0 to -ENOSYS	Celeste Liu
	Otherwise when the tracer changes syscall number to -1, the kernel fails to initialize a0 with -ENOSYS and subsequently fails to return the error code of the failed syscall to userspace. For example, it will break strace syscall tampering. Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1") Reported-by: "Dmitry V. Levin" <ldv@strace.io> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Cc: stable@vger.kernel.org Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com> Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-14	netfilter: allow ipv6 fragments to arrive on different devices	Tom Hughes
	Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device for multicast and link-local packets") modified the ipv6 fragment reassembly logic to distinguish frag queues by device for multicast and link-local packets but in fact only the main reassembly code limits the use of the device to those address types and the netfilter reassembly code uses the device for all packets. This means that if fragments of a packet arrive on different interfaces then netfilter will fail to reassemble them and the fragments will be expired without going any further through the filters. Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Tom Hughes <tom@compton.nu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-08-14	i2c: Use IS_REACHABLE() for substituting empty ACPI functions	Richard Fitzgerald
	Replace IS_ENABLED() with IS_REACHABLE() to substitute empty stubs for: i2c_acpi_get_i2c_resource() i2c_acpi_client_count() i2c_acpi_find_bus_speed() i2c_acpi_new_device_by_fwnode() i2c_adapter *i2c_acpi_find_adapter_by_handle() i2c_acpi_waive_d0_probe() commit f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions") partially fixed this conditional to depend on CONFIG_I2C, but used IS_ENABLED(), which is wrong since CONFIG_I2C is tristate. CONFIG_ACPI is boolean but let's also change it to use IS_REACHABLE() to future-proof it against becoming tristate. Somehow despite testing various combinations of CONFIG_I2C and CONFIG_ACPI we missed the combination CONFIG_I2C=m, CONFIG_ACPI=y. Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com> Fixes: f17c06c6608a ("i2c: Fix conditional for substituting empty ACPI functions") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408141333.gYnaitcV-lkp@intel.com/ Reviewed-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
2024-08-14	KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG	Amit Shah
	"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of the UAPI, keep the current definition and add a new one with the fix. Fix-suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Amit Shah <amit.shah@amd.com> Message-ID: <20240814083113.21622-1-amit@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-14	arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE	Haibo Xu
	Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE. Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization") Cc: <stable@vger.kernel.org> # 4.19.x Reported-by: Andrew Jones <ajones@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Haibo Xu <haibo1.xu@intel.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.1722828421.git.haibo1.xu@intel.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-14	arm64: uaccess: correct thinko in __get_mem_asm()	Mark Rutland
	In the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y version of __get_mem_asm(), we incorrectly use _ASM_EXTABLE_##type##ACCESS_ERR() such that upon a fault the extable fixup handler writes -EFAULT into "%w0", which is the register containing 'x' (the result of the load). This was a thinko in commit: 86a6a68febfcf57b ("arm64: start using 'asm goto' for get_user() when available") Prior to that commit _ASM_EXTABLE_##type##ACCESS_ERR_ZERO() was used such that the extable fixup handler wrote -EFAULT into "%w0" (the register containing 'err'), and zero into "%w1" (the register containing 'x'). When the 'err' variable was removed, the extable entry was updated incorrectly. Writing -EFAULT to the value register is unnecessary but benign: * We never want -EFAULT in the value register, and previously this would have been zeroed in the extable fixup handler. * In __get_user_error() the value is overwritten with zero explicitly in the error path. * The asm goto outputs cannot be used when the goto label is taken, as older compilers (e.g. clang < 16.0.0) do not guarantee that asm goto outputs are usable in this path and may use a stale value rather than the value in an output register. Consequently, zeroing in the extable fixup handler is insufficient to ensure callers see zero in the error path. * The expected usage of unsafe_get_user() and get_kernel_nofault() requires that the value is not consumed in the error path. Some versions of GCC would mis-compile asm goto with outputs, and erroneously omit subsequent assignments, breaking the error path handling in __get_user_error(). This was discussed at: https://lore.kernel.org/lkml/ZpfxLrJAOF2YNqCk@J2N7QTR9R3.cambridge.arm.com/ ... and was fixed by removing support for asm goto with outputs on those broken compilers in commit: f2f6a8e887172503 ("init/Kconfig: remove CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND") With that out of the way, we can safely replace the usage of _ASM_EXTABLE_##type##ACCESS_ERR() with _ASM_EXTABLE_##type##ACCESS(), leaving the value register unchanged in the case a fault is taken, as was originally intended. This matches other architectures and matches our __put_mem_asm(). Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20240807103731.2498893-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-08-14	KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)	Sean Christopherson
	Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't directly emulate instructions for ES/SNP, and instead the guest must explicitly request emulation. Unless the guest explicitly requests emulation without accessing memory, ES/SNP relies on KVM creating an MMIO SPTE, with the subsequent #NPF being reflected into the guest as a #VC. But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs, because except for ES/SNP, doing so requires setting reserved bits in the SPTE, i.e. the SPTE can't be readable while also generating a #VC on writes. Because KVM never creates MMIO SPTEs and jumps directly to emulation, the guest never gets a #VC. And since KVM simply resumes the guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU into an infinite #NPF loop if the vCPU attempts to write read-only memory. Disallow read-only memory for all VMs with protected state, i.e. for upcoming TDX VMs as well as ES/SNP VMs. For TDX, it's actually possible to support read-only memory, as TDX uses EPT Violation #VE to reflect the fault into the guest, e.g. KVM could configure read-only SPTEs with RX protections and SUPPRESS_VE=0. But there is no strong use case for supporting read-only memslots on TDX, e.g. the main historical usage is to emulate option ROMs, but TDX disallows executing from shared memory. And if someone comes along with a legitimate, strong use case, the restriction can always be lifted for TDX. Don't bother trying to retroactively apply the restriction to SEV-ES VMs that are created as type KVM_X86_DEFAULT_VM. Read-only memslots can't possibly work for SEV-ES, i.e. disallowing such memslots is really just means reporting an error to userspace instead of silently hanging vCPUs. Trying to deal with the ordering between KVM_SEV_INIT and memslot creation isn't worth the marginal benefit it would provide userspace. Fixes: 26c44aa9e076 ("KVM: SEV: define VM types for SEV and SEV-ES") Fixes: 1dfe571c12cf ("KVM: SEV: Add initial SEV-SNP support") Cc: Peter Gonda <pgonda@google.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Vishal Annapurve <vannapurve@google.com> Cc: Ackerly Tng <ackerleytng@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240809190319.1710470-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-14	Merge tag 'selinux-pr-20240814' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fixes from Paul Moore: - Fix a xperms counting problem where we adding to the xperms count even if we failed to add the xperm. - Propogate errors from avc_add_xperms_decision() back to the caller so that we can trigger the proper cleanup and error handling. - Revert our use of vma_is_initial_heap() in favor of our older logic as vma_is_initial_heap() doesn't correctly handle the no-heap case and it is causing issues with the SELinux process/execheap access control. While the older SELinux logic may not be perfect, it restores the expected user visible behavior. Hopefully we will be able to resolve the problem with the vma_is_initial_heap() macro with the mm folks, but we need to fix this in the meantime. * tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: revert our use of vma_is_initial_heap() selinux: add the processing of the failure of avc_add_xperms_decision() selinux: fix potential counting error in avc_add_xperms_decision()
2024-08-14	Merge tag 'vfs-6.11-rc4.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - Fix the name of file lease slab cache. When file leases were split out of file locks the name of the file lock slab cache was used for the file leases slab cache as well. - Fix a type in take_fd() helper. - Fix infinite directory iteration for stable offsets in tmpfs. - When the icache is pruned all reclaimable inodes are marked with I_FREEING and other processes that try to lookup such inodes will block. But some filesystems like ext4 can trigger lookups in their inode evict callback causing deadlocks. Ext4 does such lookups if the ea_inode feature is used whereby a separate inode may be used to store xattrs. Introduce I_LRU_ISOLATING which pins the inode while its pages are reclaimed. This avoids inode deletion during inode_lru_isolate() avoiding the deadlock and evict is made to wait until I_LRU_ISOLATING is done. netfs: - Fault in smaller chunks for non-large folio mappings for filesystems that haven't been converted to large folios yet. - Fix the CONFIG_NETFS_DEBUG config option. The config option was renamed a short while ago and that introduced two minor issues. First, it depended on CONFIG_NETFS whereas it wants to depend on CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter does. Second, the documentation for the config option wasn't fixed up. - Revert the removal of the PG_private_2 writeback flag as ceph is using it and fix how that flag is handled in netfs. - Fix DIO reads on 9p. A program watching a file on a 9p mount wouldn't see any changes in the size of the file being exported by the server if the file was changed directly in the source filesystem. Fix this by attempting to read the full size specified when a DIO read is requested. - Fix a NULL pointer dereference bug due to a data race where a cachefiles cookies was retired even though it was still in use. Check the cookie's n_accesses counter before discarding it. nsfs: - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as the kernel is writing to userspace. pidfs: - Prevent the creation of pidfds for kthreads until we have a use-case for it and we know the semantics we want. It also confuses userspace why they can get pidfds for kthreads. squashfs: - Fix an unitialized value bug reported by KMSAN caused by a corrupted symbolic link size read from disk. Check that the symbolic link size is not larger than expected" * tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Squashfs: sanity check symbolic link size 9p: Fix DIO read through netfs vfs: Don't evict inode under the inode lru traversing context netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag" file: fix typo in take_fd() comment pidfd: prevent creation of pidfds for kthreads netfs: clean up after renaming FSCACHE_DEBUG config libfs: fix infinite directory reads for offset dir nsfs: fix ioctl declaration fs/netfs/fscache_cookie: add missing "n_accesses" check filelock: fix name of file_lease slab cache netfs: Fault in smaller chunks for non-large folio mappings
2024-08-14	Merge tag 'bpf-6.11-rc4' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Pull bpf fixes from Alexei Starovoitov: - Fix bpftrace regression from Kyle Huey. Tracing bpf prog was called with perf_event input arguments causing bpftrace produce garbage output. - Fix verifier crash in stacksafe() from Yonghong Song. Daniel Hodges reported verifier crash when playing with sched-ext. The stack depth in the known verifier state was larger than stack depth in being explored state causing out-of-bounds access. - Fix update of freplace prog in prog_array from Leon Hwang. freplace prog type wasn't recognized correctly. * tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: perf/bpf: Don't call bpf_overflow_handler() for tracing events selftests/bpf: Add a test to verify previous stacksafe() fix bpf: Fix a kernel verifier crash in stacksafe() bpf: Fix updating attached freplace prog in prog_array map
2024-08-14	xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set	Darrick J. Wong
	If a file has the S_DAX flag (aka fsdax access mode) set, we cannot allow users to change the realtime flag unless the datadev and rtdev both support fsdax access modes. Even if there are no extents allocated to the file, the setattr thread could be racing with another thread that has already started down the write code paths. Fixes: ba23cba9b3bdc ("fs: allow per-device dax status checking for filesystems") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-08-14	xfs: revert AIL TASK_KILLABLE threshold	Darrick J. Wong
	In commit 9adf40249e6c, we changed the behavior of the AIL thread to set its own task state to KILLABLE whenever the timeout value is nonzero. Unfortunately, this missed the fact that xfsaild_push will return 50ms (aka a longish sleep) when we reach the push target or the AIL becomes empty, so xfsaild goes to sleep for a long period of time in uninterruptible D state. This results in artificially high load averages because KILLABLE processes are UNINTERRUPTIBLE, which contributes to load average even though the AIL is asleep waiting for someone to interrupt it. It's not blocked on IOs or anything, but people scrap ps for processes that look like they're stuck in D state, so restore the previous threshold. Fixes: 9adf40249e6c ("xfs: AIL doesn't need manual pushing") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-08-14	xfs: attr forks require attr, not attr2	Darrick J. Wong
	It turns out that I misunderstood the difference between the attr and attr2 feature bits. "attr" means that at some point an attr fork was created somewhere in the filesystem. "attr2" means that inodes have variable-sized forks, but says nothing about whether or not there actually /are/ attr forks in the system. If we have an attr fork, we only need to check that attr is set. Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
2024-08-14	ALSA: hda/tas2781: Use correct endian conversion	Takashi Iwai
	The data conversion is done rather by a wrong function. We convert to BE32, not from BE32. Although the end result must be same, this was complained by the compiler. Fix the code again and align with another similar function tas2563_apply_calib() that does already right. Fixes: 3beddef84d90 ("ALSA: hda/tas2781: fix wrong calibrated data order") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202408141630.DiDUB8Z4-lkp@intel.com/ Link: https://patch.msgid.link/20240814100500.1944-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-08-14	Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"	Niklas Cassel
	This reverts commit 28ab9769117ca944cb6eb537af5599aa436287a4. Sense data can be in either fixed format or descriptor format. SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit: "The SATL shall support this bit as defined in SPC-5 with the following exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense data), then the SATL should return fixed format sense data for ATA PASS-THROUGH commands." The libata SATL has always kept D_SENSE set to zero by default. (It is however possible to change the value using a MODE SELECT SG_IO command.) Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit, however, successful ATA PASS-THROUGH commands incorrectly returned the sense data in descriptor format (regardless of the D_SENSE bit). Commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH commands. However, after commit 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"), there were bug reports that hdparm, hddtemp, and udisks were no longer working as expected. These applications incorrectly assume the returned sense data is in descriptor format, without even looking at the RESPONSE CODE field in the returned sense data (to see which format the returned sense data is in). Considering that there will be broken versions of these applications around roughly forever, we are stuck with being bug compatible with older kernels. Cc: stable@vger.kernel.org # 4.19+ Reported-by: Stephan Eisvogel <eisvogel@seitics.de> Reported-by: Christian Heusel <christian@heusel.eu> Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/ Fixes: 28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error") Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240813131900.1285842-2-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-08-14	ALSA: usb-audio: Support Yamaha P-125 quirk entry	Juan José Arboleda
	This patch adds a USB quirk for the Yamaha P-125 digital piano. Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20240813161053.70256-1-soyjuanarbol@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-08-14	net: ethernet: dlink: replace deprecated macro	Moon Yeounsu
	Macro `SIMPLE_DEV_PM_OPS()` is deprecated. This patch replaces `SIMPLE_DEV_PM_OPS()` with `DEFINE_SIMPLE_DEV_PM_OPS()` currently used. Expanded results are the same since remaining member is initialized as zero (NULL): static SIMPLE_DEV_PM_OPS(rio_pm_ops, rio_suspend, rio_resume); Expanded to: static const struct dev_pm_ops __attribute__((__unused__)) rio_pm_ops = { .suspend = ((1) ? ((rio_suspend)) : ((void )0)), .resume = ((1) ? ((rio_resume)) : ((void )0)), .freeze = ((1) ? ((rio_suspend)) : ((void )0)), .thaw = ((1) ? ((rio_resume)) : ((void )0)), .poweroff = ((1) ? ((rio_suspend)) : ((void )0)), .restore = ((1) ? ((rio_resume)) : ((void )0)), }; static DEFINE_SIMPLE_DEV_PM_OPS(rio_pm_ops, rio_suspend, rio_resume); Expanded to: static const struct dev_pm_ops rio_pm_ops = { .suspend = ((1) ? ((rio_suspend)) : ((void )0)), .resume = ((1) ? ((rio_resume)) : ((void )0)), .freeze = ((1) ? ((rio_suspend)) : ((void )0)), .thaw = ((1) ? ((rio_resume)) : ((void )0)), .poweroff = ((1) ? ((rio_suspend)) : ((void )0)), .restore = ((1) ? ((rio_resume)) : ((void )0)), .runtime_suspend = ((void )0), .runtime_resume = ((void )0), .runtime_idle = ((void *)0), }; Signed-off-by: Moon Yeounsu <yyyynoom@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-14	tcp: Update window clamping condition	Subash Abhinov Kasiviswanathan
	This patch is based on the discussions between Neal Cardwell and Eric Dumazet in the link https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicinc.com/ It was correctly pointed out that tp->window_clamp would not be updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if (copied <= tp->rcvq_space.space). While it is expected for most setups to leave the sysctl enabled, the latter condition may not end up hitting depending on the TCP receive queue size and the pattern of arriving data. The updated check should be hit only on initial MSS update from TCP_MIN_MSS to measured MSS value and subsequently if there was an update to a larger value. Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF") Signed-off-by: Sean Tranchetti <quic_stranche@quicinc.com> Signed-off-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-14	media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices	Hans de Goede
	Commit a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support") broke BYT support because it removed a seemingly unused field from struct sh_css_sp_config and a seemingly unused value from enum ia_css_input_mode. But these are part of the ABI between the kernel and firmware on ISP2400 and this part of the TPG support removal changes broke ISP2400 support. ISP2401 support was not affected because on ISP2401 only a part of struct sh_css_sp_config is used. Restore the removed field and enum value to fix this. Fixes: a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-13	bcachefs: bcachefs_metadata_version_disk_accounting_inum	Kent Overstreet
	This adds another disk accounting counter to track usage per inode number (any snapshot ID). This will be used for a couple things: - It'll give us a way to tell the user how much space a given file ista consuming in all snapshots; i.e. how much extra space it's consuming due to snapshot versioning. - It counts number of extents and total size of extents (both in btree keyspace sectors and actual disk usage), meaning it gives us average extent size: that is, it'll let us cheaply find fragmented files that should be defragmented. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Kill __bch2_accounting_mem_mod()	Kent Overstreet
	The next patch will be adding a disk accounting counter type which is not kept in the in-memory eytzinger tree. As prep, fold __bch2_accounting_mem_mod() into bch2_accounting_mem_mod_locked() so that we can check for that counter type and bail out without calling bpos_to_disk_accounting_pos() twice. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Make bkey_fsck_err() a wrapper around fsck_err()	Kent Overstreet
	bkey_fsck_err() was added as an interface that looks like fsck_err(), but previously all it did was ensure that the appropriate error counter was incremented in the superblock. This is a cleanup and bugfix patch that converts it to a wrapper around fsck_err(). This is needed to fix an issue with the upgrade path to disk_accounting_v3, where the "silent fix" error list now includes bkey_fsck errors; fsck_err() handles this in a unified way, and since we need to change printing of bkey fsck errors from the caller to the inner bkey_fsck_err() calls, this ends up being a pretty big change. Als,, rename .invalid() methods to .validate(), for clarity, while we're changing the function signature anyways (to drop the printbuf argument). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Fix warning in __bch2_fsck_err() for trans not passed in	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Add a time_stat for blocked on key cache flush	Kent Overstreet
	Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Improve trans_blocked_journal_reclaim tracepoint	Kent Overstreet
	include information about the state of the btree key cache Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	bcachefs: Add hysteresis to waiting on btree key cache flush	Kent Overstreet
	This helps ensure key cache reclaim isn't contending with threads waiting for the key cache to be helped, and fixes a severe performance bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-08-13	lib/generic-radix-tree.c: Fix rare race in __genradix_ptr_alloc()	Kent Overstreet
	If we need to increase the tree depth, allocate a new node, and then race with another thread that increased the tree depth before us, we'll still have a preallocated node that might be used later. If we then use that node for a new non-root node, it'll still have a pointer to the old root instead of being zeroed - fix this by zeroing it in the cmpxchg failure path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>