summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-27net: phy: mscc: fix packet loss due to RGMII delaysVladimir Oltean
Two deadly typos break RX and TX traffic on the VSC8502 PHY using RGMII if phy-mode = "rgmii-id" or "rgmii-txid", and no "tx-internal-delay-ps" override exists. The negative error code from phy_get_internal_delay() does not get overridden with the delay deduced from the phy-mode, and later gets committed to hardware. Also, the rx_delay gets overridden by what should have been the tx_delay. Fixes: dbb050d2bfc8 ("phy: mscc: Add support for RGMII delay configuration") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Harini Katakam <harini.katakam@amd.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230627134235.3453358-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27Merge branch 'use-vmalloc_array-and-vcalloc'Jakub Kicinski
Julia Lawall says: ==================== use vmalloc_array and vcalloc The functions vmalloc_array and vcalloc were introduced in commit a8749a35c399 ("mm: vmalloc: introduce array allocation functions") but are not used much yet. This series introduces uses of these functions, to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch. @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) v2: This series uses vmalloc_array and vcalloc instead of array_size. It also leaves a multiplication of a constant by a sizeof as is. Two patches are thus dropped from the series. ==================== Link: https://lore.kernel.org/r/20230627144339.144478-1-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27net: mana: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-23-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27net: enetc: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-19-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27ionic: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-12-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27pds_core: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-10-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27gve: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27octeon_ep: use vmalloc_array and vcallocJulia Lawall
Use vmalloc_array and vcalloc to protect against multiplication overflows. The changes were done using the following Coccinelle semantic patch: // <smpl> @initialize:ocaml@ @@ let rename alloc = match alloc with "vmalloc" -> "vmalloc_array" | "vzalloc" -> "vcalloc" | _ -> failwith "unknown" @@ size_t e1,e2; constant C1, C2; expression E1, E2, COUNT, x1, x2, x3; typedef u8; typedef __u8; type t = {u8,__u8,char,unsigned char}; identifier alloc = {vmalloc,vzalloc}; fresh identifier realloc = script:ocaml(alloc) { rename alloc }; @@ ( alloc(x1*x2*x3) | alloc(C1 * C2) | alloc((sizeof(t)) * (COUNT), ...) | - alloc((e1) * (e2)) + realloc(e1, e2) | - alloc((e1) * (COUNT)) + realloc(COUNT, e1) | - alloc((E1) * (E2)) + realloc(E1, E2) ) // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Link: https://lore.kernel.org/r/20230627144339.144478-3-Julia.Lawall@inria.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-27ACPI: EC: Fix acpi_ec_dispatch_gpe()Rafael J. Wysocki
Commit 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only") broke suspend-to-idle at least on Dell XPS13 9360 and 9380. The problem is that acpi_ec_dispatch_gpe() must clear the EC GPE, because the EC GPE handler never runs when the system is in the suspend-to-idle state and if the EC GPE is not cleared by the suspend- to-idle loop, it is never cleared at all which leads to a GPE storm. This causes suspend-to-idle to burn energy instead of saving it which is potentially dangerous (the affected machines heat up rather badly when that happens). Addess this by making acpi_ec_dispatch_gpe() clear the EC GPE as it did before. Fixes: 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only") Tested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-27dt-bindings: interrupt-controller: add Ralink SoCs interrupt controllerSergio Paracuellos
Add YAML doc for the interrupt controller which is present on Ralink SoCs. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Link: https://lore.kernel.org/r/20230623035901.1514341-1-sergio.paracuellos@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-27dt-bindings: PCI: dwc: rockchip: Update for RK3588Sebastian Reichel
The PCIe 2.0 controllers on RK3588 need one additional clock, one additional reset line and one for ranges entry. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230616170022.76107-4-sebastian.reichel@collabora.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-27net: usb: qmi_wwan: add u-blox 0x1312 compositionDavide Tronchin
Add RmNet support for LARA-R6 01B. The new LARA-R6 product variant identified by the "01B" string can be configured (by AT interface) in three different USB modes: * Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial interfaces * RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial interfaces and 1 RmNet virtual network interface * CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial interface and 1 CDC-ECM virtual network interface The first 4 interfaces of all the 3 configurations (default, RmNet, ECM) are the same. In RmNet mode LARA-R6 01B exposes the following interfaces: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: AT parset/alternative functions If 4: RMNET interface Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com> Link: https://lore.kernel.org/r/20230626125336.3127-1-davide.tronchin.94@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27perf trace: fix MSG_SPLICE_PAGES build errorMatthieu Baerts
Our MPTCP CI and Stephen got this error: In file included from builtin-trace.c:907: trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags': trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function) 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ The fix is similar to what was done with MSG_FASTOPEN: the new macro is defined if it is not defined in the system headers. Fixes: b848b26c6672 ("net: Kill MSG_SENDPAGE_NOTLAST") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20230626112847.2ef3d422@canb.auug.org.au/ Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230626090239.899672-1-matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27Merge tag 'nf-23-06-27' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Reset shift on Boyer-Moore string match for each block, from Jeremy Sowden. 2) Fix acccess to non-linear area in DCCP conntrack helper, from Florian Westphal. 3) Fix kernel-doc warnings, by Randy Dunlap. 4) Bail out if expires= does not show in SIP helper message, or make ct_sip_parse_numerical_param() tristate and report error if expires= cannot be parsed. 5) Unbind non-anonymous set in case rule construction fails. 6) Fix underflow in chain reference counter in case set element already exists or it cannot be created. netfilter pull request 23-06-27 ==================== Link: https://lore.kernel.org/r/20230627065304.66394-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27ipvlan: Fix return value of ipvlan_queue_xmit()Cambda Zhu
ipvlan_queue_xmit() should return NET_XMIT_XXX, but ipvlan_xmit_mode_l2/l3() returns rx_handler_result_t or NET_RX_XXX in some cases. ipvlan_rcv_frame() will only return RX_HANDLER_CONSUMED in ipvlan_xmit_mode_l2/l3() because 'local' is true. It's equal to NET_XMIT_SUCCESS. But dev_forward_skb() can return NET_RX_SUCCESS or NET_RX_DROP, and returning NET_RX_DROP(NET_XMIT_DROP) will increase both ipvlan and ipvlan->phy_dev drops counter. The skb to forward can be treated as xmitted successfully. This patch makes ipvlan_queue_xmit() return NET_XMIT_SUCCESS for forward skb. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Link: https://lore.kernel.org/r/20230626093347.7492-1-cambda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27Merge 'irq/loongarch-fixes-6.5' into loongarch-nextHuacai Chen
LoongArch architecture changes for 6.5 depend on the irq changes to work on both ACPI and FDT systems, so merge them to create a base.
2023-06-27powerpc: remove checks for binutils older than 2.25Masahiro Yamada
Commit e4412739472b ("Documentation: raise minimum supported version of binutils to 2.25") allows us to remove the checks for old binutils. There is no more user for ld-ifversion. Remove it as well. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230119082250.151485-1-masahiroy@kernel.org
2023-06-27powerpc: Fail build if using recordmcount with binutils v2.37Naveen N Rao
binutils v2.37 drops unused section symbols, which prevents recordmcount from capturing mcount locations in sections that have no non-weak symbols. This results in a build failure with a message such as: Cannot find symbol for section 12: .text.perf_callchain_kernel. kernel/events/callchain.o: failed The change to binutils was reverted for v2.38, so this behavior is specific to binutils v2.37: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c09c8b42021180eee9495bd50d8b35e683d3901b Objtool is able to cope with such sections, so this issue is specific to recordmcount. Fail the build and print a warning if binutils v2.37 is detected and if we are using recordmcount. Cc: stable@vger.kernel.org Suggested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230530061436.56925-1-naveen@kernel.org
2023-06-26Merge tag 'tag-chrome-platform-for-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Tzung-Bi Shih: "Improvements: - Support Pin Assignment D in getting mux state - Emit an uevent when EC panics so that userland programs get chance to capture EC coredumps (LPC interface only) - Send EC_CMD_HOST_SLEEP_EVENT to EC at the very beginning/end of system suspend/resume so that EC can watch the duration more accurately (LPC interface only) Misc: - Switch back from I2C .probe_new() to .probe() - Use %*ph for printing hexdump of small buffers" * tag 'tag-chrome-platform-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_spi: Use %*ph for printing hexdump of a small buffer platform/chrome: Switch i2c drivers back to use .probe() platform/chrome: cros_ec_lpc: Move host command to prepare/complete platform/chrome: cros_ec: Report EC panic as uevent platform/chrome: cros_typec_switch: Add Pin D support
2023-06-26Merge tag 'thermal-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These extend the int340x thermal driver, add thermal DT bindings for some Qcom platforms, add DT bindings and support for Armada AP807 and MSM8909, allow selecting the bang-bang thermal governor as the default one, address issues in several thermal drivers for ARM platforms and clean up code. Specifics: - Add new IOCTLs to the int340x thermal driver to allow user space to retrieve the Passive v2 thermal table (Srinivas Pandruvada) - Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms (Konrad Dybcio) - Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki) - Add DT bindings for QCom ipq9574 (Praveenkumar I) - Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren) - Allow selecting the bang-bang governor as default (Thierry Reding) - Refactor and prepare the code to set the scene for RCar Gen4 (Wolfram Sang) - Clean up and fix the QCom tsens drivers. Add DT bindings and calibration for the MSM8909 platform (Stephan Gerhold) - Revert a patch introducing a wrong usage of devm_of_iomap() on the Mediatek platform (Ricardo Cañuelo) - Fix the clock vs reset ordering in order to conform to the documentation on the sun8i (Christophe JAILLET) - Prevent setting up undocumented registers, enable the only described sensors and add the version 2.1 on the Qoriq sensor (Peng Fan) - Add DT bindings and support for the Armada AP807 (Alex Leibovich) - Update the mlx5 driver with the recent thermal changes (Daniel Lezcano) - Convert to platform remove callback returning void on STM32 (Uwe Kleine-König) - Add an error information printing for devm_thermal_add_hwmon_sysfs() and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra, Qoriq, Mediateka and QCom (Yangtao Li) - Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai) - Use the dev_err_probe() function in the QCom tsens alarm driver (Luca Weiss)" * tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits) thermal/drivers/qcom/temp-alarm: Use dev_err_probe thermal/drivers/generic-adc: Register thermal zones as hwmon sensors thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start() thermal/drivers/qcom: Remove redundant msg at probe time thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor() thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone() thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel() drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe() thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe() thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe() thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register() thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs() thermal/drivers/stm32: Convert to platform remove callback returning void net/mlx5: Update the driver with the recent thermal changes thermal/drivers/armada: Add support for AP807 thermal data dt-bindings: armada-thermal: Add armada-ap807-thermal compatible thermal/drivers/qoriq: Support version 2.1 thermal/drivers/qoriq: Only enable supported sensors thermal/drivers/qoriq: No need to program site adjustment register thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors ...
2023-06-26Merge tag 'pm-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add Intel TPMI (Topology Aware Register and PM Capsule Interface) support to the power capping subsystem, extend the intel_idle driver to work in VM guests where MWAIT is not available, extend the system-wide power management diagnostics, fix bugs and clean up code. Specifics: - Introduce power capping core support for Intel TPMI (Topology Aware Register and PM Capsule Interface) and a TPMI interface driver for Intel RAPL (Zhang Rui, Dan Carpenter) - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping driver (Zhang Rui) - Fix invalid initialization for pl4_supported field in the Intel RAPL power capping driver (Sumeet Pawnikar) - Clean up the intel_idle driver, make it work with VM guests that cannot use the MWAIT instruction and address the case in which the host may enter a deep idle state when the guest is idle (Arjan van de Ven) - Prevent cpufreq drivers that provide the ->adjust_perf() callback without a ->fast_switch() one which is used as a fallback from the former in some cases (Wyes Karny) - Fix some issues related to the AMD P-state cpufreq driver (Mario Limonciello, Wyes Karny) - Fix the energy_performance_preference attribute handling in the intel_pstate driver in passive mode (Tero Kristo) - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset (Kai-Heng Feng) - Correct spelling mistake in a comment in the hibernation code (Wang Honghui) - Add arch_resume_nosmt() prototype to avoid a "missing prototypes" build warning (Arnd Bergmann) - Restrict pm_pr_dbg() to system-wide power transitions and use it in a few additional places (Mario Limonciello) - Drop verification of in-params from genpd_add_device() and ensure that all of its callers will do it (Ulf Hansson) - Prevent possible integer overflows from occurring in genpd_parse_state() (Nikita Zhandarovich) - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size somewhat (Christophe JAILLET) - Ensure that the Exynos PPMU driver is already loaded before the Exynos Bus driver starts probing so as to avoid a possible freeze loading of the kernel modules (Marek Szyprowski) - Fix variable deferencing before NULL check in the mtk-cci devfreq driver (Sukrut Bellary)" * tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits) intel_idle: Add a "Long HLT" C1 state for the VM guest mode cpufreq: intel_pstate: Fix energy_performance_preference for passive cpufreq: amd-pstate: Add a kernel config option to set default mode cpufreq: amd-pstate: Set a fallback policy based on preferred_profile ACPI: CPPC: Add definition for undefined FADT preferred PM profile value cpufreq: amd-pstate: Set default governor to schedutil PM: domains: Move the verification of in-params from genpd_add_device() cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated cpufreq: amd-pstate: Write CPPC enable bit per-socket intel_idle: Add support for using intel_idle in a VM guest using just hlt cpufreq: Fail driver register if it has adjust_perf without fast_switch intel_idle: clean up the (new) state_update_enter_method function intel_idle: refactor state->enter manipulation into its own function platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages pinctrl: amd: Use pm_pr_dbg to show debugging messages ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume powercap: RAPL: Fix a NULL vs IS_ERR() bug powercap: RAPL: Fix CONFIG_IOSF_MBI dependency powercap: RAPL: fix invalid initialization for pl4_supported field ...
2023-06-26Merge tag 'acpi-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These rework the handling of notifications in ACPI button drivers (to enable future simplifications and cleanups), clean up the ACPI thermal driver, update the ACPI backlight driver, add quirks working around AML bugs on some systems, fix some assorted issues and clean up code. Specifics: - Reduce ACPI device enumeration overhead related to devices with dependencies (Rafael Wysocki) - Fix the handling of Microsoft LPS0 _DSM for suspend-to-idle (Mario Limonciello) - Fix section mismatch warning in the ACPI suspend-to-idle code (Arnd Bergmann) - Drop several ACPI resource management quirks related to IRQ ovverides on AMD "Zen" systems (Mario Limonciello) - Modify the ACPI EC driver to make it only clear the EC GPE status when handling the GPE (Jeremy Compostella) - Add quirks to work around ACPI tables defects on Lenovo Yoga Book yb1-x90f/l and Nextbook Ares 8A (Hans de Goede) - Add ACPi backlight quirks for Dell Studio 1569, Lenovo ThinkPad X131e (3371 AMD version) and Apple iMac11,3 and stop trying to use vendor backlight control on relatively recent systems (Hans de Goede) - Add pwm_lookup_table entry for second PWM on CHT/BSW devices in the ACPI LPSS (Intel SoC) driver (Hans de Goede) - Add nfit_intel_shutdown_status() declaration to a local header to avoid a "missing prototypes" build warning (Arnd Bergmann) - Clean up the ACPI thermal driver and drop some dead or otherwise unneded code from it (Rafael Wysocki) - Rework the handling of notifications in the ACPI button drivers so as to allow the common notification handling code for devices to be simplified (Rafael Wysocki) - Make ghes_get_devices() return NULL to indicate that there are no GHES devices so as to allow vendor-specific EDAC drivers to probe then (Li Yang) - Mark bert_disable() as __initdata and drop an unused function from the APEI GHES code (Miaohe Lin) - Make the ACPI PAD (Processor Aggregator Device) driver realize that Zhaoxin CPUs support nonstop TSC (Tony W Wang-oc) - Drop the certainly unnecessary and likely incorrect inclusion of linux/arm-smccc.h from acpi_ffh.c (Sudeep Holla)" * tag 'acpi-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits) ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569 ACPI: thermal: Drop struct acpi_thermal_flags ACPI: thermal: Drop struct acpi_thermal_state ACPI: bus: Simplify installation and removal of notify callback ACPI: tiny-power-button: Eliminate the driver notify callback ACPI: button: Use different notify handlers for lid and buttons ACPI: button: Eliminate the driver notify callback ACPI: thermal: Eliminate struct acpi_thermal_state_flags ACPI: thermal: Move acpi_thermal_driver definition ACPI: thermal: Move symbol definitions to one place ACPI: thermal: Drop redundant ACPI_TRIPS_REFRESH_DEVICES symbol ACPI: thermal: Use BIT() macro for defining flags APEI: GHES: correctly return NULL for ghes_get_devices() ACPI: FFH: Drop the inclusion of linux/arm-smccc.h ACPI: PAD: mark Zhaoxin CPUs NONSTOP TSC correctly ACPI: APEI: mark bert_disable as __initdata ACPI: EC: Clear GPE on interrupt handling only ACPI: video: Stop trying to use vendor backlight control on laptops from after ~2012 ACPI: x86: s2idle: Adjust Microsoft LPS0 _DSM handling sequence ACPI: resource: Remove "Zen" specific match and quirks ...
2023-06-26Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ...
2023-06-26Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: - lots of build cleanups from Arnd spread throughout the arch/arm tree - replace strlcpy() with the preferred strscpy() - use sign_extend32() in the module linker - drop handle_irq() machine descriptor method * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9315/1: fiq: include asm/mach/irq.h for prototypes ARM: 9314/1: tcm: move tcm_init() prototype to asm/tcm.h ARM: 9313/1: vdso: add missing prototypes ARM: 9312/1: vfp: include asm/neon.h in vfpmodule.c ARM: 9311/1: decompressor: move function prototypes to misc.h ARM: 9310/1: xip-kernel: add __inflate_kernel_data prototype ARM: 9309/1: add missing syscall prototypes ARM: 9308/1: move setup functions to header ARM: 9307/1: nommu: include asm/idmap.h ARM: 9306/1: cacheflush: avoid __flush_anon_page() missing-prototype warning ARM: 9305/1: add clear/copy_user_highpage declarations ARM: 9304/1: add prototype for function called only from asm ARM: 9303/1: kprobes: avoid missing-declaration warnings ARM: 9302/1: traps: hide unused functions on NOMMU ARM: 9301/1: dma-mapping: hide unused dma_contiguous_early_fixup function ARM: 9300/1: Replace all non-returning strlcpy with strscpy ARM: 9299/1: module: use sign_extend32() to extend the signedness ARM: 9298/1: Drop custom mdesc->handle_irq()
2023-06-26Merge tag 'm68k-for-v6.5-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - miscellaneous NuBus fixes and improvements - defconfig updates * tag 'm68k-for-v6.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.4-rc1 nubus: Don't list slot resources by default nubus: Remove proc entries before adding them nubus: Partially revert proc_create_single_data() conversion
2023-06-26Merge tag 'x86_cleanups_for_6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Dave Hansen: "As usual, these are all over the map. The biggest cluster is work from Arnd to eliminate -Wmissing-prototype warnings: - Address -Wmissing-prototype warnings - Remove repeated 'the' in comments - Remove unused current_untag_mask() - Document urgent tip branch timing - Clean up MSR kernel-doc notation - Clean up paravirt_ops doc - Update Srivatsa S. Bhat's maintained areas - Remove unused extern declaration acpi_copy_wakeup_routine()" * tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine() Documentation: virt: Clean up paravirt_ops doc x86/mm: Remove unused current_untag_mask() x86/mm: Remove repeated word in comments x86/lib/msr: Clean up kernel-doc notation x86/platform: Avoid missing-prototype warnings for OLPC x86/mm: Add early_memremap_pgprot_adjust() prototype x86/usercopy: Include arch_wb_cache_pmem() declaration x86/vdso: Include vdso/processor.h x86/mce: Add copy_mc_fragile_handle_tail() prototype x86/fbdev: Include asm/fb.h as needed x86/hibernate: Declare global functions in suspend.h x86/entry: Add do_SYSENTER_32() prototype x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled() x86/mm: Include asm/numa.h for set_highmem_pages_init() x86: Avoid missing-prototype warnings for doublefault code x86/fpu: Include asm/fpu/regset.h x86: Add dummy prototype for mk_early_pgtbl_32() x86/pci: Mark local functions as 'static' x86/ftrace: Move prepare_ftrace_return prototype to header ...
2023-06-26ext4: avoid updating the superblock on a r/o mount if not neededTheodore Ts'o
This was noticed by a user who noticied that the mtime of a file backing a loopback device was getting bumped when the loopback device is mounted read/only. Note: This doesn't show up when doing a loopback mount of a file directly, via "mount -o ro /tmp/foo.img /mnt", since the loop device is set read-only when mount automatically creates loop device. However, this is noticeable for a LUKS loop device like this: % cryptsetup luksOpen /tmp/foo.img test % mount -o ro /dev/loop0 /mnt ; umount /mnt or, if LUKS is not in use, if the user manually creates the loop device like this: % losetup /dev/loop0 /tmp/foo.img % mount -o ro /dev/loop0 /mnt ; umount /mnt The modified mtime causes rsync to do a rolling checksum scan of the file on the local and remote side, incrementally increasing the time to rsync the not-modified-but-touched image file. Fixes: eee00237fa5e ("ext4: commit super block if fs record error when journal record without error") Cc: stable@kernel.org Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch Reported-by: Sean Greenslade <sean@seangreenslade.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: skip reading super block if it has been verifiedZhang Yi
We got a NULL pointer dereference issue below while running generic/475 I/O failure pressure test. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 RIP: 0010:jbd2_journal_set_features+0x13d/0x430 ... Call Trace: <TASK> ? __die+0x23/0x60 ? page_fault_oops+0xa4/0x170 ? exc_page_fault+0x67/0x170 ? asm_exc_page_fault+0x26/0x30 ? jbd2_journal_set_features+0x13d/0x430 jbd2_journal_revoke+0x47/0x1e0 __ext4_forget+0xc3/0x1b0 ext4_free_blocks+0x214/0x2f0 ext4_free_branches+0xeb/0x270 ext4_ind_truncate+0x2bf/0x320 ext4_truncate+0x1e4/0x490 ext4_handle_inode_extension+0x1bd/0x2a0 ? iomap_dio_complete+0xaf/0x1d0 The root cause is the journal super block had been failed to write out due to I/O fault injection, it's uptodate bit was cleared by end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). And it raced by journal_get_superblock()->bh_read(), unfortunately, the read IO is also failed, so the error handling in journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, finally lead to above NULL pointer dereference issue. If the journal super block had been read and verified, there is no need to call bh_read() read it again even if it has been failed to written out. So the fix could be simply move buffer_verified(bh) in front of bh_read(). Also remove a stale comment left in jbd2_journal_check_used_features(). Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: fix to check return value of freeze_bdev() in ext4_shutdown()Chao Yu
freeze_bdev() can fail due to a lot of reasons, it needs to check its reason before later process. Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: refactoring to use the unified helper ext4_quotas_off()Baokun Li
Rename ext4_quota_off_umount() to ext4_quotas_off(), and add type parameter to replace open code in ext4_enable_quotas(). Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230327141630.156875-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: turn quotas off if mount failed after enabling quotasBaokun Li
Yi found during a review of the patch "ext4: don't BUG on inconsistent journal feature" that when ext4_mark_recovery_complete() returns an error value, the error handling path does not turn off the enabled quotas, which triggers the following kmemleak: ================================================================ unreferenced object 0xffff8cf68678e7c0 (size 64): comm "mount", pid 746, jiffies 4294871231 (age 11.540s) hex dump (first 32 bytes): 00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00 ............A... c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00 ............H... backtrace: [<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880 [<00000000d4e621d7>] kmalloc_trace+0x39/0x140 [<00000000837eee74>] v2_read_file_info+0x18a/0x3a0 [<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770 [<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0 [<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4] [<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4] [<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4] [<000000004a9489c4>] get_tree_bdev+0x1dc/0x370 [<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4] [<00000000c7cb663d>] vfs_get_tree+0x31/0x160 [<00000000320e1bed>] do_new_mount+0x1d5/0x480 [<00000000c074654c>] path_mount+0x22e/0xbe0 [<0000000003e97a8e>] do_mount+0x95/0xc0 [<000000002f3d3736>] __x64_sys_mount+0xc4/0x160 [<0000000027d2140c>] do_syscall_64+0x3f/0x90 ================================================================ To solve this problem, we add a "failed_mount10" tag, and call ext4_quota_off_umount() in this tag to release the enabled qoutas. Fixes: 11215630aada ("ext4: don't BUG on inconsistent journal feature") Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: update doc about journal superblock descriptionZhang Yi
Update the description in doc about the s_head parameter in the journal superblock. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20230322013353.1843306-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: add journal cycled recording supportZhang Yi
Always enable 'JBD2_CYCLE_RECORD' journal option on ext4, letting the jbd2 continue to record new journal transactions from the recovered journal head or the checkpointed transactions in the previous mount. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: continue to record log between each mountZhang Yi
For a newly mounted file system, the journal committing thread always record new transactions from the start of the journal area, no matter whether the journal was clean or just has been recovered. So the logdump code in debugfs cannot dump continuous logs between each mount, it is disadvantageous to analysis corrupted file system image and locate the file system inconsistency bugs. If we get a corrupted file system in the running products and want to find out what has happened, besides lookup the system log, one effective way is to backtrack the journal log. But we may not always run e2fsck before each mount and the default fsck -a mode also cannot always checkout all inconsistencies, so it could left over some inconsistencies into the next mount until we detect it. Finally, transactions in the journal may probably discontinuous and some relatively new transactions has been covered, it becomes hard to analyse. If we could record transactions continuously between each mount, we could acquire more useful info from the journal. Like this: |Previous mount checkpointed/recovered logs|Current mount logs | |{------}{---}{--------} ... {------}| ... |{======}{========}...000000| And yes the journal area is limited and cannot record everything, the problematic transaction may also be covered even if we do this, but this is still useful for fuzzy tests and short-running products. This patch save the head blocknr in the superblock after flushing the journal or unmounting the file system, let the next mount could continue to record new transaction behind it. This change is backward compatible because the old kernel does not care about the head blocknr of the journal. It is also fine if we mount a clean old image without valid head blocknr, we fail back to set it to s_first just like before. Finally, for the case of mount an unclean file system, we could also get the journal head easily after scanning/replaying the journal, it will continue to record new transaction after the recovered transactions. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: remove j_format_versionZhang Yi
journal->j_format_version is no longer used, remove it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: factor out journal initialization from journal_get_superblock()Zhang Yi
Current journal_get_superblock() couple journal superblock checking and partial journal initialization, factor out initialization part from it to make things clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: switch to check format version in superblock directlyZhang Yi
We should only check and set extented features if journal format version is 2, and now we check the in memory copy of the superblock 'journal->j_format_version', which relys on the parameter initialization sequence, switch to use the h_blocktype in superblock cloud be more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: remove unused feature macrosZhang Yi
JBD2_HAS_[IN|RO_]COMPAT_FEATURE macros are no longer used, just remove them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-4-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev'Zhihao Cheng
As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always become true if sbi->s_journal_bdev exists. Filesystem block device and journal block device are both opened with 'FMODE_EXCL' mode, so these two devices can't be same one. Then we can remove the redundant checking 'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists. [1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/ Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: Fix reusing stale buffer heads from last failed mountingZhihao Cheng
Following process makes ext4 load stale buffer heads from last failed mounting in a new mounting operation: mount_bdev ext4_fill_super | ext4_load_and_init_journal | ext4_load_journal | jbd2_journal_load | load_superblock | journal_get_superblock | set_buffer_verified(bh) // buffer head is verified | jbd2_journal_recover // failed caused by EIO | goto failed_mount3a // skip 'sb->s_root' initialization deactivate_locked_super kill_block_super generic_shutdown_super if (sb->s_root) // false, skip ext4_put_super->invalidate_bdev-> // invalidate_mapping_pages->mapping_evict_folio-> // filemap_release_folio->try_to_free_buffers, which // cannot drop buffer head. blkdev_put blkdev_put_whole if (atomic_dec_and_test(&bdev->bd_openers)) // false, systemd-udev happens to open the device. Then // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> // truncate_inode_folio->truncate_cleanup_folio-> // folio_invalidate->block_invalidate_folio-> // filemap_release_folio->try_to_free_buffers will be skipped, // dropping buffer head is missed again. Second mount: ext4_fill_super ext4_load_and_init_journal ext4_load_journal ext4_get_journal jbd2_journal_init_inode journal_init_common bh = getblk_unmovable bh = __find_get_block // Found stale bh in last failed mounting journal->j_sb_buffer = bh jbd2_journal_load load_superblock journal_get_superblock if (buffer_verified(bh)) // true, skip journal->j_format_version = 2, value is 0 jbd2_journal_recover do_one_pass next_log_block += count_tags(journal, bh) // According to journal_tag_bytes(), 'tag_bytes' calculating is // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() // returns false because 'j->j_format_version >= 2' is not true, // then we get wrong next_log_block. The do_one_pass may exit // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. The filesystem is corrupted here, journal is partially replayed, and new journal sequence number actually is already used by last mounting. The invalidate_bdev() can drop all buffer heads even racing with bare reading block device(eg. systemd-udev), so we can fix it by invalidating bdev in error handling path in __ext4_fill_super(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") Cc: stable@vger.kernel.org # v3.5 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: allow concurrent unaligned dio overwritesBrian Foster
We've had reports of significant performance regression of sub-block (unaligned) direct writes due to the added exclusivity restrictions in ext4. The purpose of the exclusivity requirement for unaligned direct writes is to avoid data corruption caused by unserialized partial block zeroing in the iomap dio layer across overlapping writes. XFS has similar requirements for the same underlying reasons, yet doesn't suffer the extreme performance regression that ext4 does. The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY mode, which allows for optimistic submission of concurrent unaligned I/O and kicks back writes that require partial block zeroing such that they can be submitted in a safe, exclusive context. Since ext4 already performs most of these checks pre-submission, it can support something similar without necessarily relying on the iomap flag and associated retry mechanism. Update the dio write submission path to allow concurrent submission of unaligned direct writes that are purely overwrite and so will not require block zeroing. To improve readability of the various related checks, move the unaligned I/O handling down into ext4_dio_write_checks(), where the dio draining and force wait logic can immediately follow the locking requirement checks. Finally, the IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a precaution should the ext4 overwrite logic ever become inconsistent with the zeroing expectations of iomap dio. The performance improvement of sub-block direct write I/O is shown in the following fio test on a 64xcpu guest vm: Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting --overwrite=1 --thread --size=10G --filename=/mnt/fio --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8 --blocksize=2k --iodepth=256 --allow_file_create=0 v6.2: write: IOPS=4328, BW=8724KiB/s v6.2 (patched): write: IOPS=801k, BW=1565MiB/s Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: clean up mballoc criteria commentsTheodore Ts'o
Line wrap and slightly clarify the comments describing mballoc's cirtiera. Define EXT4_MB_NUM_CRS as part of the enum, so that it will automatically get updated when criteria is added or removed. Also fix a potential unitialized use of 'cr' variable if CONFIG_EXT4_DEBUG is enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_zeroout_es() return voidBaokun Li
After ext4_es_insert_extent() returns void, the return value in ext4_zeroout_es() is also unnecessary, so make it return void too. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-13-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_insert_extent() return voidBaokun Li
Now ext4_es_insert_extent() never return error, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-12-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_insert_delayed_block() return voidBaokun Li
Now it never fails when inserting a delay extent, so the return value in ext4_es_insert_delayed_block is no longer necessary, let it return void. [ Fixed bug which caused system hangs during bigalloc test runs. See https://lore.kernel.org/r/20230612030405.GH1436857@mit.edu for more details. -- TYT ] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-11-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_remove_extent() return voidBaokun Li
Now ext4_es_remove_extent() never fails, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: using nofail preallocation in ext4_es_insert_extent()Baokun Li
Similar to in ext4_es_insert_delayed_block(), we use preallocations that do not fail to avoid inconsistencies, but we do not care about es that are not must be kept, and we return 0 even if such es memory allocation fails. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: using nofail preallocation in ext4_es_insert_delayed_block()Baokun Li
Similar to in ext4_es_remove_extent(), we use a no-fail preallocation to avoid inconsistencies, except that here we may have to preallocate two extent_status. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: using nofail preallocation in ext4_es_remove_extent()Baokun Li
If __es_remove_extent() returns an error it means that when splitting extent, allocating an extent that must be kept failed, where returning an error directly would cause the extent tree to be inconsistent. So we use GFP_NOFAIL to pre-allocate an extent_status and pass it to __es_remove_extent() to avoid this problem. In addition, since the allocated memory is outside the i_es_lock, the extent_status tree may change and the pre-allocated extent_status is no longer needed, so we release the pre-allocated extent_status when es->es_len is not initialized. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: use pre-allocated es in __es_remove_extent()Baokun Li
When splitting extent, if the second extent can not be dropped, we return -ENOMEM and use GFP_NOFAIL to preallocate an extent_status outside of i_es_lock and pass it to __es_remove_extent() to be used as the second extent. This ensures that __es_remove_extent() is executed successfully, thus ensuring consistency in the extent status tree. If the second extent is not undroppable, we simply drop it and return 0. Then retry is no longer necessary, remove it. Now, __es_remove_extent() will always remove what it should, maybe more. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>