summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-06-27vDPA/ifcvf: a vendor driver should not set _CONFIG_S_FAILEDZhu Lingshan
VIRTIO_CONFIG_S_FAILED indicates the guest driver has given up the device due to fatal errors. So it is the guest decision, the vendor driver should not set this status to the device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: synchronize irqs in the reset routineZhu Lingshan
This commit synchronize irqs of the virtqueues and config space in the reset routine. Thus ifcvf_stop() and reset() are refactored as well. This commit renames ifcvf_stop_hw() to ifcvf_stop() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: retire ifcvf_start_datapath and ifcvf_add_statusZhu Lingshan
Rather than former lazy-initialization mechanism, now the virtqueue operations and driver_features related ops access the virtio registers directly to take immediate actions. So ifcvf_start_datapath() should retire. ifcvf_add_status() is retierd because we should not change device status by a vendor driver's decision, this driver should only set device status which is from virito drivers upon vdpa_ops.set_status() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: get_driver_features from virtio registersZhu Lingshan
This commit implements a new function ifcvf_get_driver_feature() which read driver_features from virtio registers. To be less ambiguous, ifcvf_set_features() is renamed to ifcvf_set_driver_features(), and ifcvf_get_features() is renamed to ifcvf_get_dev_features() which returns the provisioned vDPA device features. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27vDPA/ifcvf: virt queue ops take immediate actionsZhu Lingshan
In this commit, virtqueue operations including: set_vq_num(), set_vq_address(), set_vq_ready() and get_vq_ready() access PCI registers directly to take immediate actions. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230526145254.39537-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-06-27dt-bindings: interrupt-controller: add Ralink SoCs interrupt controllerSergio Paracuellos
Add YAML doc for the interrupt controller which is present on Ralink SoCs. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com> Link: https://lore.kernel.org/r/20230623035901.1514341-1-sergio.paracuellos@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-27dt-bindings: PCI: dwc: rockchip: Update for RK3588Sebastian Reichel
The PCIe 2.0 controllers on RK3588 need one additional clock, one additional reset line and one for ranges entry. Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230616170022.76107-4-sebastian.reichel@collabora.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-06-27net: usb: qmi_wwan: add u-blox 0x1312 compositionDavide Tronchin
Add RmNet support for LARA-R6 01B. The new LARA-R6 product variant identified by the "01B" string can be configured (by AT interface) in three different USB modes: * Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial interfaces * RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial interfaces and 1 RmNet virtual network interface * CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial interface and 1 CDC-ECM virtual network interface The first 4 interfaces of all the 3 configurations (default, RmNet, ECM) are the same. In RmNet mode LARA-R6 01B exposes the following interfaces: If 0: Diagnostic If 1: AT parser If 2: AT parser If 3: AT parset/alternative functions If 4: RMNET interface Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com> Link: https://lore.kernel.org/r/20230626125336.3127-1-davide.tronchin.94@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27perf trace: fix MSG_SPLICE_PAGES build errorMatthieu Baerts
Our MPTCP CI and Stephen got this error: In file included from builtin-trace.c:907: trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags': trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function) 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in 28 | if (flags & MSG_##n) { | ^~~~ trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG' 50 | P_MSG_FLAG(SPLICE_PAGES); | ^~~~~~~~~~ The fix is similar to what was done with MSG_FASTOPEN: the new macro is defined if it is not defined in the system headers. Fixes: b848b26c6672 ("net: Kill MSG_SENDPAGE_NOTLAST") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/r/20230626112847.2ef3d422@canb.auug.org.au/ Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20230626090239.899672-1-matthieu.baerts@tessares.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27Merge tag 'nf-23-06-27' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Reset shift on Boyer-Moore string match for each block, from Jeremy Sowden. 2) Fix acccess to non-linear area in DCCP conntrack helper, from Florian Westphal. 3) Fix kernel-doc warnings, by Randy Dunlap. 4) Bail out if expires= does not show in SIP helper message, or make ct_sip_parse_numerical_param() tristate and report error if expires= cannot be parsed. 5) Unbind non-anonymous set in case rule construction fails. 6) Fix underflow in chain reference counter in case set element already exists or it cannot be created. netfilter pull request 23-06-27 ==================== Link: https://lore.kernel.org/r/20230627065304.66394-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27efi/libstub: Disable PCI DMA before grabbing the EFI memory mapArd Biesheuvel
Currently, the EFI stub will disable PCI DMA as the very last thing it does before calling ExitBootServices(), to avoid interfering with the firmware's normal operation as much as possible. However, the stub will invoke DisconnectController() on all endpoints downstream of the PCI bridges it disables, and this may affect the layout of the EFI memory map, making it substantially more likely that ExitBootServices() will fail the first time around, and that the EFI memory map needs to be reloaded. This, in turn, increases the likelihood that the slack space we allocated is insufficient (and we can no longer allocate memory via boot services after having called ExitBootServices() once), causing the second call to GetMemoryMap (and therefore the boot) to fail. This makes the PCI DMA disable feature a bit more fragile than it already is, so let's make it more robust, by allocating the space for the EFI memory map after disabling PCI DMA. Fixes: 4444f8541dad16fe ("efi: Allow disabling PCI busmastering on bridges during boot") Reported-by: Glenn Washburn <development@efficientek.com> Acked-by: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-06-27crypto: akcipher - Do not copy dst if it is NULLHerbert Xu
As signature verification has a NULL destination buffer, the pointer needs to be checked before the memcpy is done. Fixes: addde1f2c966 ("crypto: akcipher - Add sync interface without SG lists") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-06-27ipvlan: Fix return value of ipvlan_queue_xmit()Cambda Zhu
ipvlan_queue_xmit() should return NET_XMIT_XXX, but ipvlan_xmit_mode_l2/l3() returns rx_handler_result_t or NET_RX_XXX in some cases. ipvlan_rcv_frame() will only return RX_HANDLER_CONSUMED in ipvlan_xmit_mode_l2/l3() because 'local' is true. It's equal to NET_XMIT_SUCCESS. But dev_forward_skb() can return NET_RX_SUCCESS or NET_RX_DROP, and returning NET_RX_DROP(NET_XMIT_DROP) will increase both ipvlan and ipvlan->phy_dev drops counter. The skb to forward can be treated as xmitted successfully. This patch makes ipvlan_queue_xmit() return NET_XMIT_SUCCESS for forward skb. Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Link: https://lore.kernel.org/r/20230626093347.7492-1-cambda@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-27crypto: sig - Fix verify callHerbert Xu
The dst SG list needs to be set to NULL for verify calls. Do this as otherwise the underlying algorithm may fail. Furthermore the digest needs to be copied just like the source. Fixes: 6cb8815f41a9 ("crypto: sig - Add interface for sign/verify") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-06-27crypto: akcipher - Set request tfm on sync pathHerbert Xu
The request tfm needs to be set. Fixes: addde1f2c966 ("crypto: akcipher - Add sync interface without SG lists") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202306261421.2ac744fa-oliver.sang@intel.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-06-27Merge 'irq/loongarch-fixes-6.5' into loongarch-nextHuacai Chen
LoongArch architecture changes for 6.5 depend on the irq changes to work on both ACPI and FDT systems, so merge them to create a base.
2023-06-27powerpc: remove checks for binutils older than 2.25Masahiro Yamada
Commit e4412739472b ("Documentation: raise minimum supported version of binutils to 2.25") allows us to remove the checks for old binutils. There is no more user for ld-ifversion. Remove it as well. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230119082250.151485-1-masahiroy@kernel.org
2023-06-27powerpc: Fail build if using recordmcount with binutils v2.37Naveen N Rao
binutils v2.37 drops unused section symbols, which prevents recordmcount from capturing mcount locations in sections that have no non-weak symbols. This results in a build failure with a message such as: Cannot find symbol for section 12: .text.perf_callchain_kernel. kernel/events/callchain.o: failed The change to binutils was reverted for v2.38, so this behavior is specific to binutils v2.37: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c09c8b42021180eee9495bd50d8b35e683d3901b Objtool is able to cope with such sections, so this issue is specific to recordmcount. Fail the build and print a warning if binutils v2.37 is detected and if we are using recordmcount. Cc: stable@vger.kernel.org Suggested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230530061436.56925-1-naveen@kernel.org
2023-06-26Merge tag 'tag-chrome-platform-for-v6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome platform updates from Tzung-Bi Shih: "Improvements: - Support Pin Assignment D in getting mux state - Emit an uevent when EC panics so that userland programs get chance to capture EC coredumps (LPC interface only) - Send EC_CMD_HOST_SLEEP_EVENT to EC at the very beginning/end of system suspend/resume so that EC can watch the duration more accurately (LPC interface only) Misc: - Switch back from I2C .probe_new() to .probe() - Use %*ph for printing hexdump of small buffers" * tag 'tag-chrome-platform-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_spi: Use %*ph for printing hexdump of a small buffer platform/chrome: Switch i2c drivers back to use .probe() platform/chrome: cros_ec_lpc: Move host command to prepare/complete platform/chrome: cros_ec: Report EC panic as uevent platform/chrome: cros_typec_switch: Add Pin D support
2023-06-26Merge tag 'thermal-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control updates from Rafael Wysocki: "These extend the int340x thermal driver, add thermal DT bindings for some Qcom platforms, add DT bindings and support for Armada AP807 and MSM8909, allow selecting the bang-bang thermal governor as the default one, address issues in several thermal drivers for ARM platforms and clean up code. Specifics: - Add new IOCTLs to the int340x thermal driver to allow user space to retrieve the Passive v2 thermal table (Srinivas Pandruvada) - Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms (Konrad Dybcio) - Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki) - Add DT bindings for QCom ipq9574 (Praveenkumar I) - Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren) - Allow selecting the bang-bang governor as default (Thierry Reding) - Refactor and prepare the code to set the scene for RCar Gen4 (Wolfram Sang) - Clean up and fix the QCom tsens drivers. Add DT bindings and calibration for the MSM8909 platform (Stephan Gerhold) - Revert a patch introducing a wrong usage of devm_of_iomap() on the Mediatek platform (Ricardo Cañuelo) - Fix the clock vs reset ordering in order to conform to the documentation on the sun8i (Christophe JAILLET) - Prevent setting up undocumented registers, enable the only described sensors and add the version 2.1 on the Qoriq sensor (Peng Fan) - Add DT bindings and support for the Armada AP807 (Alex Leibovich) - Update the mlx5 driver with the recent thermal changes (Daniel Lezcano) - Convert to platform remove callback returning void on STM32 (Uwe Kleine-König) - Add an error information printing for devm_thermal_add_hwmon_sysfs() and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra, Qoriq, Mediateka and QCom (Yangtao Li) - Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai) - Use the dev_err_probe() function in the QCom tsens alarm driver (Luca Weiss)" * tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits) thermal/drivers/qcom/temp-alarm: Use dev_err_probe thermal/drivers/generic-adc: Register thermal zones as hwmon sensors thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start() thermal/drivers/qcom: Remove redundant msg at probe time thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor() thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone() thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel() drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe() thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe() thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe() thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register() thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs() thermal/drivers/stm32: Convert to platform remove callback returning void net/mlx5: Update the driver with the recent thermal changes thermal/drivers/armada: Add support for AP807 thermal data dt-bindings: armada-thermal: Add armada-ap807-thermal compatible thermal/drivers/qoriq: Support version 2.1 thermal/drivers/qoriq: Only enable supported sensors thermal/drivers/qoriq: No need to program site adjustment register thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors ...
2023-06-26Merge tag 'pm-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add Intel TPMI (Topology Aware Register and PM Capsule Interface) support to the power capping subsystem, extend the intel_idle driver to work in VM guests where MWAIT is not available, extend the system-wide power management diagnostics, fix bugs and clean up code. Specifics: - Introduce power capping core support for Intel TPMI (Topology Aware Register and PM Capsule Interface) and a TPMI interface driver for Intel RAPL (Zhang Rui, Dan Carpenter) - Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping driver (Zhang Rui) - Fix invalid initialization for pl4_supported field in the Intel RAPL power capping driver (Sumeet Pawnikar) - Clean up the intel_idle driver, make it work with VM guests that cannot use the MWAIT instruction and address the case in which the host may enter a deep idle state when the guest is idle (Arjan van de Ven) - Prevent cpufreq drivers that provide the ->adjust_perf() callback without a ->fast_switch() one which is used as a fallback from the former in some cases (Wyes Karny) - Fix some issues related to the AMD P-state cpufreq driver (Mario Limonciello, Wyes Karny) - Fix the energy_performance_preference attribute handling in the intel_pstate driver in passive mode (Tero Kristo) - Fix the handling of pm_suspend_target_state when CONFIG_PM is unset (Kai-Heng Feng) - Correct spelling mistake in a comment in the hibernation code (Wang Honghui) - Add arch_resume_nosmt() prototype to avoid a "missing prototypes" build warning (Arnd Bergmann) - Restrict pm_pr_dbg() to system-wide power transitions and use it in a few additional places (Mario Limonciello) - Drop verification of in-params from genpd_add_device() and ensure that all of its callers will do it (Ulf Hansson) - Prevent possible integer overflows from occurring in genpd_parse_state() (Nikita Zhandarovich) - Reorder fieldls in 'struct devfreq_dev_status' to reduce its size somewhat (Christophe JAILLET) - Ensure that the Exynos PPMU driver is already loaded before the Exynos Bus driver starts probing so as to avoid a possible freeze loading of the kernel modules (Marek Szyprowski) - Fix variable deferencing before NULL check in the mtk-cci devfreq driver (Sukrut Bellary)" * tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits) intel_idle: Add a "Long HLT" C1 state for the VM guest mode cpufreq: intel_pstate: Fix energy_performance_preference for passive cpufreq: amd-pstate: Add a kernel config option to set default mode cpufreq: amd-pstate: Set a fallback policy based on preferred_profile ACPI: CPPC: Add definition for undefined FADT preferred PM profile value cpufreq: amd-pstate: Set default governor to schedutil PM: domains: Move the verification of in-params from genpd_add_device() cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated cpufreq: amd-pstate: Write CPPC enable bit per-socket intel_idle: Add support for using intel_idle in a VM guest using just hlt cpufreq: Fail driver register if it has adjust_perf without fast_switch intel_idle: clean up the (new) state_update_enter_method function intel_idle: refactor state->enter manipulation into its own function platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages pinctrl: amd: Use pm_pr_dbg to show debugging messages ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume powercap: RAPL: Fix a NULL vs IS_ERR() bug powercap: RAPL: Fix CONFIG_IOSF_MBI dependency powercap: RAPL: fix invalid initialization for pl4_supported field ...
2023-06-26Merge tag 'acpi-6.5-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These rework the handling of notifications in ACPI button drivers (to enable future simplifications and cleanups), clean up the ACPI thermal driver, update the ACPI backlight driver, add quirks working around AML bugs on some systems, fix some assorted issues and clean up code. Specifics: - Reduce ACPI device enumeration overhead related to devices with dependencies (Rafael Wysocki) - Fix the handling of Microsoft LPS0 _DSM for suspend-to-idle (Mario Limonciello) - Fix section mismatch warning in the ACPI suspend-to-idle code (Arnd Bergmann) - Drop several ACPI resource management quirks related to IRQ ovverides on AMD "Zen" systems (Mario Limonciello) - Modify the ACPI EC driver to make it only clear the EC GPE status when handling the GPE (Jeremy Compostella) - Add quirks to work around ACPI tables defects on Lenovo Yoga Book yb1-x90f/l and Nextbook Ares 8A (Hans de Goede) - Add ACPi backlight quirks for Dell Studio 1569, Lenovo ThinkPad X131e (3371 AMD version) and Apple iMac11,3 and stop trying to use vendor backlight control on relatively recent systems (Hans de Goede) - Add pwm_lookup_table entry for second PWM on CHT/BSW devices in the ACPI LPSS (Intel SoC) driver (Hans de Goede) - Add nfit_intel_shutdown_status() declaration to a local header to avoid a "missing prototypes" build warning (Arnd Bergmann) - Clean up the ACPI thermal driver and drop some dead or otherwise unneded code from it (Rafael Wysocki) - Rework the handling of notifications in the ACPI button drivers so as to allow the common notification handling code for devices to be simplified (Rafael Wysocki) - Make ghes_get_devices() return NULL to indicate that there are no GHES devices so as to allow vendor-specific EDAC drivers to probe then (Li Yang) - Mark bert_disable() as __initdata and drop an unused function from the APEI GHES code (Miaohe Lin) - Make the ACPI PAD (Processor Aggregator Device) driver realize that Zhaoxin CPUs support nonstop TSC (Tony W Wang-oc) - Drop the certainly unnecessary and likely incorrect inclusion of linux/arm-smccc.h from acpi_ffh.c (Sudeep Holla)" * tag 'acpi-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits) ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569 ACPI: thermal: Drop struct acpi_thermal_flags ACPI: thermal: Drop struct acpi_thermal_state ACPI: bus: Simplify installation and removal of notify callback ACPI: tiny-power-button: Eliminate the driver notify callback ACPI: button: Use different notify handlers for lid and buttons ACPI: button: Eliminate the driver notify callback ACPI: thermal: Eliminate struct acpi_thermal_state_flags ACPI: thermal: Move acpi_thermal_driver definition ACPI: thermal: Move symbol definitions to one place ACPI: thermal: Drop redundant ACPI_TRIPS_REFRESH_DEVICES symbol ACPI: thermal: Use BIT() macro for defining flags APEI: GHES: correctly return NULL for ghes_get_devices() ACPI: FFH: Drop the inclusion of linux/arm-smccc.h ACPI: PAD: mark Zhaoxin CPUs NONSTOP TSC correctly ACPI: APEI: mark bert_disable as __initdata ACPI: EC: Clear GPE on interrupt handling only ACPI: video: Stop trying to use vendor backlight control on laptops from after ~2012 ACPI: x86: s2idle: Adjust Microsoft LPS0 _DSM handling sequence ACPI: resource: Remove "Zen" specific match and quirks ...
2023-06-27OPP: Properly propagate error along when failing to get icc_pathAndrew Halaney
fa155f4f8348 ("OPP: Use dev_err_probe() when failing to get icc_path") failed to actually use the error it was trying to log: smatch warnings: drivers/opp/of.c:516 dev_pm_opp_of_find_icc_paths() warn: passing zero to 'dev_err_probe' Make sure to use the right error and pass it along. Fixes: fa155f4f8348 ("OPP: Use dev_err_probe() when failing to get icc_path") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202306262008.guNLgjt6-lkp@intel.com/ Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2023-06-26Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Notable features are user-space support for the memcpy/memset instructions and the permission indirection extension. - Support for the Armv8.9 Permission Indirection Extensions. While this feature doesn't add new functionality, it enables future support for Guarded Control Stacks (GCS) and Permission Overlays - User-space support for the Armv8.8 memcpy/memset instructions - arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and cleanups - Removal of superfluous ISBs on context switch (following retrospective architecture tightening) - Decode the ISS2 register during faults for additional information to help with debugging - KPTI clean-up/simplification of the trampoline exit code - Addressing several -Wmissing-prototype warnings - Kselftest improvements for signal handling and ptrace - Fix TPIDR2_EL0 restoring on sigreturn - Clean-up, robustness improvements of the module allocation code - More sysreg conversions to the automatic register/bitfields generation - CPU capabilities handling cleanup - Arm documentation updates: ACPI, ptdump" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits) kselftest/arm64: Add a test case for TPIDR2 restore arm64/signal: Restore TPIDR2 register rather than memory state arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe Documentation/arm64: Add ptdump documentation arm64: hibernate: remove WARN_ON in save_processor_state kselftest/arm64: Log signal code and address for unexpected signals docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst arm64/fpsimd: Exit streaming mode when flushing tasks docs: perf: Add new description for HiSilicon UC PMU drivers/perf: hisi: Add support for HiSilicon UC PMU driver drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE perf/arm-cmn: Add sysfs identifier perf/arm-cmn: Revamp model detection perf/arm_dmc620: Add cpumask arm64: mm: fix VA-range sanity check arm64/mm: remove now-superfluous ISBs from TTBR writes Documentation/arm64: Update ACPI tables from BBR Documentation/arm64: Update references in arm-acpi Documentation/arm64: Update ARM and arch reference ...
2023-06-26Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM updates from Russell King: - lots of build cleanups from Arnd spread throughout the arch/arm tree - replace strlcpy() with the preferred strscpy() - use sign_extend32() in the module linker - drop handle_irq() machine descriptor method * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9315/1: fiq: include asm/mach/irq.h for prototypes ARM: 9314/1: tcm: move tcm_init() prototype to asm/tcm.h ARM: 9313/1: vdso: add missing prototypes ARM: 9312/1: vfp: include asm/neon.h in vfpmodule.c ARM: 9311/1: decompressor: move function prototypes to misc.h ARM: 9310/1: xip-kernel: add __inflate_kernel_data prototype ARM: 9309/1: add missing syscall prototypes ARM: 9308/1: move setup functions to header ARM: 9307/1: nommu: include asm/idmap.h ARM: 9306/1: cacheflush: avoid __flush_anon_page() missing-prototype warning ARM: 9305/1: add clear/copy_user_highpage declarations ARM: 9304/1: add prototype for function called only from asm ARM: 9303/1: kprobes: avoid missing-declaration warnings ARM: 9302/1: traps: hide unused functions on NOMMU ARM: 9301/1: dma-mapping: hide unused dma_contiguous_early_fixup function ARM: 9300/1: Replace all non-returning strlcpy with strscpy ARM: 9299/1: module: use sign_extend32() to extend the signedness ARM: 9298/1: Drop custom mdesc->handle_irq()
2023-06-26Merge tag 'm68k-for-v6.5-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - miscellaneous NuBus fixes and improvements - defconfig updates * tag 'm68k-for-v6.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: defconfig: Update defconfigs for v6.4-rc1 nubus: Don't list slot resources by default nubus: Remove proc entries before adding them nubus: Partially revert proc_create_single_data() conversion
2023-06-26Merge tag 'x86_cleanups_for_6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Dave Hansen: "As usual, these are all over the map. The biggest cluster is work from Arnd to eliminate -Wmissing-prototype warnings: - Address -Wmissing-prototype warnings - Remove repeated 'the' in comments - Remove unused current_untag_mask() - Document urgent tip branch timing - Clean up MSR kernel-doc notation - Clean up paravirt_ops doc - Update Srivatsa S. Bhat's maintained areas - Remove unused extern declaration acpi_copy_wakeup_routine()" * tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine() Documentation: virt: Clean up paravirt_ops doc x86/mm: Remove unused current_untag_mask() x86/mm: Remove repeated word in comments x86/lib/msr: Clean up kernel-doc notation x86/platform: Avoid missing-prototype warnings for OLPC x86/mm: Add early_memremap_pgprot_adjust() prototype x86/usercopy: Include arch_wb_cache_pmem() declaration x86/vdso: Include vdso/processor.h x86/mce: Add copy_mc_fragile_handle_tail() prototype x86/fbdev: Include asm/fb.h as needed x86/hibernate: Declare global functions in suspend.h x86/entry: Add do_SYSENTER_32() prototype x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled() x86/mm: Include asm/numa.h for set_highmem_pages_init() x86: Avoid missing-prototype warnings for doublefault code x86/fpu: Include asm/fpu/regset.h x86: Add dummy prototype for mk_early_pgtbl_32() x86/pci: Mark local functions as 'static' x86/ftrace: Move prepare_ftrace_return prototype to header ...
2023-06-26ext4: avoid updating the superblock on a r/o mount if not neededTheodore Ts'o
This was noticed by a user who noticied that the mtime of a file backing a loopback device was getting bumped when the loopback device is mounted read/only. Note: This doesn't show up when doing a loopback mount of a file directly, via "mount -o ro /tmp/foo.img /mnt", since the loop device is set read-only when mount automatically creates loop device. However, this is noticeable for a LUKS loop device like this: % cryptsetup luksOpen /tmp/foo.img test % mount -o ro /dev/loop0 /mnt ; umount /mnt or, if LUKS is not in use, if the user manually creates the loop device like this: % losetup /dev/loop0 /tmp/foo.img % mount -o ro /dev/loop0 /mnt ; umount /mnt The modified mtime causes rsync to do a rolling checksum scan of the file on the local and remote side, incrementally increasing the time to rsync the not-modified-but-touched image file. Fixes: eee00237fa5e ("ext4: commit super block if fs record error when journal record without error") Cc: stable@kernel.org Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch Reported-by: Sean Greenslade <sean@seangreenslade.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: skip reading super block if it has been verifiedZhang Yi
We got a NULL pointer dereference issue below while running generic/475 I/O failure pressure test. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190 RIP: 0010:jbd2_journal_set_features+0x13d/0x430 ... Call Trace: <TASK> ? __die+0x23/0x60 ? page_fault_oops+0xa4/0x170 ? exc_page_fault+0x67/0x170 ? asm_exc_page_fault+0x26/0x30 ? jbd2_journal_set_features+0x13d/0x430 jbd2_journal_revoke+0x47/0x1e0 __ext4_forget+0xc3/0x1b0 ext4_free_blocks+0x214/0x2f0 ext4_free_branches+0xeb/0x270 ext4_ind_truncate+0x2bf/0x320 ext4_truncate+0x1e4/0x490 ext4_handle_inode_extension+0x1bd/0x2a0 ? iomap_dio_complete+0xaf/0x1d0 The root cause is the journal super block had been failed to write out due to I/O fault injection, it's uptodate bit was cleared by end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock(). And it raced by journal_get_superblock()->bh_read(), unfortunately, the read IO is also failed, so the error handling in journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer, finally lead to above NULL pointer dereference issue. If the journal super block had been read and verified, there is no need to call bh_read() read it again even if it has been failed to written out. So the fix could be simply move buffer_verified(bh) in front of bh_read(). Also remove a stale comment left in jbd2_journal_check_used_features(). Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()") Reported-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: fix to check return value of freeze_bdev() in ext4_shutdown()Chao Yu
freeze_bdev() can fail due to a lot of reasons, it needs to check its reason before later process. Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") Cc: stable@kernel.org Signed-off-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: refactoring to use the unified helper ext4_quotas_off()Baokun Li
Rename ext4_quota_off_umount() to ext4_quotas_off(), and add type parameter to replace open code in ext4_enable_quotas(). Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230327141630.156875-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: turn quotas off if mount failed after enabling quotasBaokun Li
Yi found during a review of the patch "ext4: don't BUG on inconsistent journal feature" that when ext4_mark_recovery_complete() returns an error value, the error handling path does not turn off the enabled quotas, which triggers the following kmemleak: ================================================================ unreferenced object 0xffff8cf68678e7c0 (size 64): comm "mount", pid 746, jiffies 4294871231 (age 11.540s) hex dump (first 32 bytes): 00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00 ............A... c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00 ............H... backtrace: [<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880 [<00000000d4e621d7>] kmalloc_trace+0x39/0x140 [<00000000837eee74>] v2_read_file_info+0x18a/0x3a0 [<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770 [<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0 [<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4] [<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4] [<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4] [<000000004a9489c4>] get_tree_bdev+0x1dc/0x370 [<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4] [<00000000c7cb663d>] vfs_get_tree+0x31/0x160 [<00000000320e1bed>] do_new_mount+0x1d5/0x480 [<00000000c074654c>] path_mount+0x22e/0xbe0 [<0000000003e97a8e>] do_mount+0x95/0xc0 [<000000002f3d3736>] __x64_sys_mount+0xc4/0x160 [<0000000027d2140c>] do_syscall_64+0x3f/0x90 ================================================================ To solve this problem, we add a "failed_mount10" tag, and call ext4_quota_off_umount() in this tag to release the enabled qoutas. Fixes: 11215630aada ("ext4: don't BUG on inconsistent journal feature") Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26dt-bindings: mfd: ti,j721e-system-controller: Remove syscon from exampleAndrew Davis
The binding for ti,am654-ehrpwm-tbclk was updated to remove the syscon compatible hint. Remove the same from the example in this binding. Signed-off-by: Andrew Davis <afd@ti.com> Link: https://lore.kernel.org/r/20230623201519.194269-1-afd@ti.com Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-06-26Merge branches 'clk-qcom' and 'clk-microchip' into clk-nextStephen Boyd
* clk-qcom: (63 commits) clk: qcom: gcc-sc8280xp: Add runtime PM clk: qcom: gpucc-sc8280xp: Add runtime PM clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags clk: qcom: gpucc-sm6375: Enable runtime pm dt-bindings: clock: sm6375-gpucc: Add VDD_GX clk: qcom: gcc-sm6115: Add missing PLL config properties clk: qcom: clk-alpha-pll: Add a way to update some bits of test_ctl(_hi) clk: qcom: gcc-ipq6018: remove duplicate initializers clk: qcom: gcc-ipq9574: Enable crypto clocks dt-bindings: clock: Add crypto clock and reset definitions clk: qcom: Add lpass audio clock controller driver for SC8280XP clk: qcom: Add lpass clock controller driver for SC8280XP dt-bindings: clock: Add LPASS AUDIOCC and reset controller for SC8280XP dt-bindings: clock: Add LPASSCC and reset controller for SC8280XP dt-bindings: clock: qcom,mmcc: define clocks/clock-names for MSM8226 clk: qcom: gpucc-sm8550: Add support for graphics clock controller clk: qcom: Add support for SM8450 GPUCC clk: qcom: gcc-sm8450: Enable hw_clk_ctrl clk: qcom: rcg2: Make hw_clk_ctrl toggleable dt-bindings: clock: qcom: Add SM8550 graphics clock controller ... * clk-microchip: clk: at91: sama7g5: s/ep_chg_chg_id/ep_chg_id clk: at91: sama7g5: switch to parent_hw and parent_data clk: at91: sckc: switch to parent_data/parent_hw clk: at91: clk-sam9x60-pll: add support for parent_hw clk: at91: clk-utmi: add support for parent_hw clk: at91: clk-system: add support for parent_hw clk: at91: clk-programmable: add support for parent_hw clk: at91: clk-peripheral: add support for parent_hw clk: at91: clk-master: add support for parent_hw clk: at91: clk-generated: add support for parent_hw clk: at91: clk-main: add support for parent_data/parent_hw
2023-06-27scripts/mksysmap: Ignore prefixed KCFI symbolsPierre-Clément Tosi
The (relatively) new KCFI feature in LLVM/Clang encodes type information for C functions by generating symbols named __kcfi_typeid_<fname>, which can then be referenced from assembly. However, some custom build rules (e.g. nVHE or early PIE on arm64) use objcopy to add a prefix to all the symbols in their object files, making mksysmap's ignore filter miss those KCFI symbols. Therefore, explicitly list those twice-prefixed KCFI symbols as ignored. Alternatively, this could also be achieved in a less verbose way by ignoring any symbol containing the string "__kcfi_typeid_". However, listing the combined prefixes explicitly saves us from running the small risk of ignoring symbols that should be kept. Signed-off-by: Pierre-Clément Tosi <ptosi@google.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-06-26ext4: update doc about journal superblock descriptionZhang Yi
Update the description in doc about the s_head parameter in the journal superblock. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20230322013353.1843306-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: add journal cycled recording supportZhang Yi
Always enable 'JBD2_CYCLE_RECORD' journal option on ext4, letting the jbd2 continue to record new journal transactions from the recovered journal head or the checkpointed transactions in the previous mount. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: continue to record log between each mountZhang Yi
For a newly mounted file system, the journal committing thread always record new transactions from the start of the journal area, no matter whether the journal was clean or just has been recovered. So the logdump code in debugfs cannot dump continuous logs between each mount, it is disadvantageous to analysis corrupted file system image and locate the file system inconsistency bugs. If we get a corrupted file system in the running products and want to find out what has happened, besides lookup the system log, one effective way is to backtrack the journal log. But we may not always run e2fsck before each mount and the default fsck -a mode also cannot always checkout all inconsistencies, so it could left over some inconsistencies into the next mount until we detect it. Finally, transactions in the journal may probably discontinuous and some relatively new transactions has been covered, it becomes hard to analyse. If we could record transactions continuously between each mount, we could acquire more useful info from the journal. Like this: |Previous mount checkpointed/recovered logs|Current mount logs | |{------}{---}{--------} ... {------}| ... |{======}{========}...000000| And yes the journal area is limited and cannot record everything, the problematic transaction may also be covered even if we do this, but this is still useful for fuzzy tests and short-running products. This patch save the head blocknr in the superblock after flushing the journal or unmounting the file system, let the next mount could continue to record new transaction behind it. This change is backward compatible because the old kernel does not care about the head blocknr of the journal. It is also fine if we mount a clean old image without valid head blocknr, we fail back to set it to s_first just like before. Finally, for the case of mount an unclean file system, we could also get the journal head easily after scanning/replaying the journal, it will continue to record new transaction after the recovered transactions. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: remove j_format_versionZhang Yi
journal->j_format_version is no longer used, remove it. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: factor out journal initialization from journal_get_superblock()Zhang Yi
Current journal_get_superblock() couple journal superblock checking and partial journal initialization, factor out initialization part from it to make things clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: switch to check format version in superblock directlyZhang Yi
We should only check and set extented features if journal format version is 2, and now we check the in memory copy of the superblock 'journal->j_format_version', which relys on the parameter initialization sequence, switch to use the h_blocktype in superblock cloud be more clear. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26jbd2: remove unused feature macrosZhang Yi
JBD2_HAS_[IN|RO_]COMPAT_FEATURE macros are no longer used, just remove them. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-4-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev'Zhihao Cheng
As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always become true if sbi->s_journal_bdev exists. Filesystem block device and journal block device are both opened with 'FMODE_EXCL' mode, so these two devices can't be same one. Then we can remove the redundant checking 'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists. [1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/ Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: Fix reusing stale buffer heads from last failed mountingZhihao Cheng
Following process makes ext4 load stale buffer heads from last failed mounting in a new mounting operation: mount_bdev ext4_fill_super | ext4_load_and_init_journal | ext4_load_journal | jbd2_journal_load | load_superblock | journal_get_superblock | set_buffer_verified(bh) // buffer head is verified | jbd2_journal_recover // failed caused by EIO | goto failed_mount3a // skip 'sb->s_root' initialization deactivate_locked_super kill_block_super generic_shutdown_super if (sb->s_root) // false, skip ext4_put_super->invalidate_bdev-> // invalidate_mapping_pages->mapping_evict_folio-> // filemap_release_folio->try_to_free_buffers, which // cannot drop buffer head. blkdev_put blkdev_put_whole if (atomic_dec_and_test(&bdev->bd_openers)) // false, systemd-udev happens to open the device. Then // blkdev_flush_mapping->kill_bdev->truncate_inode_pages-> // truncate_inode_folio->truncate_cleanup_folio-> // folio_invalidate->block_invalidate_folio-> // filemap_release_folio->try_to_free_buffers will be skipped, // dropping buffer head is missed again. Second mount: ext4_fill_super ext4_load_and_init_journal ext4_load_journal ext4_get_journal jbd2_journal_init_inode journal_init_common bh = getblk_unmovable bh = __find_get_block // Found stale bh in last failed mounting journal->j_sb_buffer = bh jbd2_journal_load load_superblock journal_get_superblock if (buffer_verified(bh)) // true, skip journal->j_format_version = 2, value is 0 jbd2_journal_recover do_one_pass next_log_block += count_tags(journal, bh) // According to journal_tag_bytes(), 'tag_bytes' calculating is // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3() // returns false because 'j->j_format_version >= 2' is not true, // then we get wrong next_log_block. The do_one_pass may exit // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'. The filesystem is corrupted here, journal is partially replayed, and new journal sequence number actually is already used by last mounting. The invalidate_bdev() can drop all buffer heads even racing with bare reading block device(eg. systemd-udev), so we can fix it by invalidating bdev in error handling path in __ext4_fill_super(). Fetch a reproducer in [Link]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171 Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming") Cc: stable@vger.kernel.org # v3.5 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: allow concurrent unaligned dio overwritesBrian Foster
We've had reports of significant performance regression of sub-block (unaligned) direct writes due to the added exclusivity restrictions in ext4. The purpose of the exclusivity requirement for unaligned direct writes is to avoid data corruption caused by unserialized partial block zeroing in the iomap dio layer across overlapping writes. XFS has similar requirements for the same underlying reasons, yet doesn't suffer the extreme performance regression that ext4 does. The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY mode, which allows for optimistic submission of concurrent unaligned I/O and kicks back writes that require partial block zeroing such that they can be submitted in a safe, exclusive context. Since ext4 already performs most of these checks pre-submission, it can support something similar without necessarily relying on the iomap flag and associated retry mechanism. Update the dio write submission path to allow concurrent submission of unaligned direct writes that are purely overwrite and so will not require block zeroing. To improve readability of the various related checks, move the unaligned I/O handling down into ext4_dio_write_checks(), where the dio draining and force wait logic can immediately follow the locking requirement checks. Finally, the IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a precaution should the ext4 overwrite logic ever become inconsistent with the zeroing expectations of iomap dio. The performance improvement of sub-block direct write I/O is shown in the following fio test on a 64xcpu guest vm: Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting --overwrite=1 --thread --size=10G --filename=/mnt/fio --readwrite=write --ramp_time=10s --runtime=60s --numjobs=8 --blocksize=2k --iodepth=256 --allow_file_create=0 v6.2: write: IOPS=4328, BW=8724KiB/s v6.2 (patched): write: IOPS=801k, BW=1565MiB/s Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: clean up mballoc criteria commentsTheodore Ts'o
Line wrap and slightly clarify the comments describing mballoc's cirtiera. Define EXT4_MB_NUM_CRS as part of the enum, so that it will automatically get updated when criteria is added or removed. Also fix a potential unitialized use of 'cr' variable if CONFIG_EXT4_DEBUG is enabled. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_zeroout_es() return voidBaokun Li
After ext4_es_insert_extent() returns void, the return value in ext4_zeroout_es() is also unnecessary, so make it return void too. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-13-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_insert_extent() return voidBaokun Li
Now ext4_es_insert_extent() never return error, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-12-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_insert_delayed_block() return voidBaokun Li
Now it never fails when inserting a delay extent, so the return value in ext4_es_insert_delayed_block is no longer necessary, let it return void. [ Fixed bug which caused system hangs during bigalloc test runs. See https://lore.kernel.org/r/20230612030405.GH1436857@mit.edu for more details. -- TYT ] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-11-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-06-26ext4: make ext4_es_remove_extent() return voidBaokun Li
Now ext4_es_remove_extent() never fails, so make it return void. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230424033846.4732-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>