Age | Commit message (Collapse) | Author |
|
Two deadly typos break RX and TX traffic on the VSC8502 PHY using RGMII
if phy-mode = "rgmii-id" or "rgmii-txid", and no "tx-internal-delay-ps"
override exists. The negative error code from phy_get_internal_delay()
does not get overridden with the delay deduced from the phy-mode, and
later gets committed to hardware. Also, the rx_delay gets overridden by
what should have been the tx_delay.
Fixes: dbb050d2bfc8 ("phy: mscc: Add support for RGMII delay configuration")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Harini Katakam <harini.katakam@amd.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230627134235.3453358-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Julia Lawall says:
====================
use vmalloc_array and vcalloc
The functions vmalloc_array and vcalloc were introduced in
commit a8749a35c399 ("mm: vmalloc: introduce array allocation functions")
but are not used much yet. This series introduces uses of
these functions, to protect against multiplication overflows.
The changes were done using the following Coccinelle semantic
patch.
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
v2: This series uses vmalloc_array and vcalloc instead of
array_size. It also leaves a multiplication of a constant by a
sizeof as is. Two patches are thus dropped from the series.
====================
Link: https://lore.kernel.org/r/20230627144339.144478-1-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-23-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-19-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-12-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-10-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-5-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Use vmalloc_array and vcalloc to protect against
multiplication overflows.
The changes were done using the following Coccinelle
semantic patch:
// <smpl>
@initialize:ocaml@
@@
let rename alloc =
match alloc with
"vmalloc" -> "vmalloc_array"
| "vzalloc" -> "vcalloc"
| _ -> failwith "unknown"
@@
size_t e1,e2;
constant C1, C2;
expression E1, E2, COUNT, x1, x2, x3;
typedef u8;
typedef __u8;
type t = {u8,__u8,char,unsigned char};
identifier alloc = {vmalloc,vzalloc};
fresh identifier realloc = script:ocaml(alloc) { rename alloc };
@@
(
alloc(x1*x2*x3)
|
alloc(C1 * C2)
|
alloc((sizeof(t)) * (COUNT), ...)
|
- alloc((e1) * (e2))
+ realloc(e1, e2)
|
- alloc((e1) * (COUNT))
+ realloc(COUNT, e1)
|
- alloc((E1) * (E2))
+ realloc(E1, E2)
)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20230627144339.144478-3-Julia.Lawall@inria.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only")
broke suspend-to-idle at least on Dell XPS13 9360 and 9380.
The problem is that acpi_ec_dispatch_gpe() must clear the EC GPE,
because the EC GPE handler never runs when the system is in the
suspend-to-idle state and if the EC GPE is not cleared by the suspend-
to-idle loop, it is never cleared at all which leads to a GPE storm.
This causes suspend-to-idle to burn energy instead of saving it which
is potentially dangerous (the affected machines heat up rather badly
when that happens).
Addess this by making acpi_ec_dispatch_gpe() clear the EC GPE as it did
before.
Fixes: 896e97bf99ec ("ACPI: EC: Clear GPE on interrupt handling only")
Tested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add YAML doc for the interrupt controller which is present on Ralink SoCs.
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Link: https://lore.kernel.org/r/20230623035901.1514341-1-sergio.paracuellos@gmail.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
The PCIe 2.0 controllers on RK3588 need one additional clock,
one additional reset line and one for ranges entry.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230616170022.76107-4-sebastian.reichel@collabora.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Add RmNet support for LARA-R6 01B.
The new LARA-R6 product variant identified by the "01B" string can be
configured (by AT interface) in three different USB modes:
* Default mode (Vendor ID: 0x1546 Product ID: 0x1311) with 4 serial
interfaces
* RmNet mode (Vendor ID: 0x1546 Product ID: 0x1312) with 4 serial
interfaces and 1 RmNet virtual network interface
* CDC-ECM mode (Vendor ID: 0x1546 Product ID: 0x1313) with 4 serial
interface and 1 CDC-ECM virtual network interface
The first 4 interfaces of all the 3 configurations (default, RmNet, ECM)
are the same.
In RmNet mode LARA-R6 01B exposes the following interfaces:
If 0: Diagnostic
If 1: AT parser
If 2: AT parser
If 3: AT parset/alternative functions
If 4: RMNET interface
Signed-off-by: Davide Tronchin <davide.tronchin.94@gmail.com>
Link: https://lore.kernel.org/r/20230626125336.3127-1-davide.tronchin.94@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Our MPTCP CI and Stephen got this error:
In file included from builtin-trace.c:907:
trace/beauty/msg_flags.c: In function 'syscall_arg__scnprintf_msg_flags':
trace/beauty/msg_flags.c:28:21: error: 'MSG_SPLICE_PAGES' undeclared (first use in this function)
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
trace/beauty/msg_flags.c:28:21: note: each undeclared identifier is reported only once for each function it appears in
28 | if (flags & MSG_##n) { | ^~~~
trace/beauty/msg_flags.c:50:9: note: in expansion of macro 'P_MSG_FLAG'
50 | P_MSG_FLAG(SPLICE_PAGES);
| ^~~~~~~~~~
The fix is similar to what was done with MSG_FASTOPEN: the new macro is
defined if it is not defined in the system headers.
Fixes: b848b26c6672 ("net: Kill MSG_SENDPAGE_NOTLAST")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/20230626112847.2ef3d422@canb.auug.org.au/
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230626090239.899672-1-matthieu.baerts@tessares.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Reset shift on Boyer-Moore string match for each block,
from Jeremy Sowden.
2) Fix acccess to non-linear area in DCCP conntrack helper,
from Florian Westphal.
3) Fix kernel-doc warnings, by Randy Dunlap.
4) Bail out if expires= does not show in SIP helper message,
or make ct_sip_parse_numerical_param() tristate and report
error if expires= cannot be parsed.
5) Unbind non-anonymous set in case rule construction fails.
6) Fix underflow in chain reference counter in case set element
already exists or it cannot be created.
netfilter pull request 23-06-27
====================
Link: https://lore.kernel.org/r/20230627065304.66394-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ipvlan_queue_xmit() should return NET_XMIT_XXX, but
ipvlan_xmit_mode_l2/l3() returns rx_handler_result_t or NET_RX_XXX
in some cases. ipvlan_rcv_frame() will only return RX_HANDLER_CONSUMED
in ipvlan_xmit_mode_l2/l3() because 'local' is true. It's equal to
NET_XMIT_SUCCESS. But dev_forward_skb() can return NET_RX_SUCCESS or
NET_RX_DROP, and returning NET_RX_DROP(NET_XMIT_DROP) will increase
both ipvlan and ipvlan->phy_dev drops counter.
The skb to forward can be treated as xmitted successfully. This patch
makes ipvlan_queue_xmit() return NET_XMIT_SUCCESS for forward skb.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230626093347.7492-1-cambda@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
LoongArch architecture changes for 6.5 depend on the irq changes
to work on both ACPI and FDT systems, so merge them to create a base.
|
|
Commit e4412739472b ("Documentation: raise minimum supported version of
binutils to 2.25") allows us to remove the checks for old binutils.
There is no more user for ld-ifversion. Remove it as well.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230119082250.151485-1-masahiroy@kernel.org
|
|
binutils v2.37 drops unused section symbols, which prevents recordmcount
from capturing mcount locations in sections that have no non-weak
symbols. This results in a build failure with a message such as:
Cannot find symbol for section 12: .text.perf_callchain_kernel.
kernel/events/callchain.o: failed
The change to binutils was reverted for v2.38, so this behavior is
specific to binutils v2.37:
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=c09c8b42021180eee9495bd50d8b35e683d3901b
Objtool is able to cope with such sections, so this issue is specific to
recordmcount.
Fail the build and print a warning if binutils v2.37 is detected and if
we are using recordmcount.
Cc: stable@vger.kernel.org
Suggested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230530061436.56925-1-naveen@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Tzung-Bi Shih:
"Improvements:
- Support Pin Assignment D in getting mux state
- Emit an uevent when EC panics so that userland programs get chance
to capture EC coredumps (LPC interface only)
- Send EC_CMD_HOST_SLEEP_EVENT to EC at the very beginning/end of
system suspend/resume so that EC can watch the duration more
accurately (LPC interface only)
Misc:
- Switch back from I2C .probe_new() to .probe()
- Use %*ph for printing hexdump of small buffers"
* tag 'tag-chrome-platform-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
platform/chrome: cros_ec_spi: Use %*ph for printing hexdump of a small buffer
platform/chrome: Switch i2c drivers back to use .probe()
platform/chrome: cros_ec_lpc: Move host command to prepare/complete
platform/chrome: cros_ec: Report EC panic as uevent
platform/chrome: cros_typec_switch: Add Pin D support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These extend the int340x thermal driver, add thermal DT bindings for
some Qcom platforms, add DT bindings and support for Armada AP807 and
MSM8909, allow selecting the bang-bang thermal governor as the default
one, address issues in several thermal drivers for ARM platforms and
clean up code.
Specifics:
- Add new IOCTLs to the int340x thermal driver to allow user space to
retrieve the Passive v2 thermal table (Srinivas Pandruvada)
- Add DT bindings for SM6375, MSM8226 and QCM2290 Qcom platforms
(Konrad Dybcio)
- Add DT bindings and support for QCom MSM8226 (Matti Lehtimäki)
- Add DT bindings for QCom ipq9574 (Praveenkumar I)
- Convert bcm2835 DT bindings to the yaml schema (Stefan Wahren)
- Allow selecting the bang-bang governor as default (Thierry Reding)
- Refactor and prepare the code to set the scene for RCar Gen4
(Wolfram Sang)
- Clean up and fix the QCom tsens drivers. Add DT bindings and
calibration for the MSM8909 platform (Stephan Gerhold)
- Revert a patch introducing a wrong usage of devm_of_iomap() on the
Mediatek platform (Ricardo Cañuelo)
- Fix the clock vs reset ordering in order to conform to the
documentation on the sun8i (Christophe JAILLET)
- Prevent setting up undocumented registers, enable the only
described sensors and add the version 2.1 on the Qoriq sensor (Peng
Fan)
- Add DT bindings and support for the Armada AP807 (Alex Leibovich)
- Update the mlx5 driver with the recent thermal changes (Daniel
Lezcano)
- Convert to platform remove callback returning void on STM32 (Uwe
Kleine-König)
- Add an error information printing for devm_thermal_add_hwmon_sysfs()
and remove the error from the Sun8i, Amlogic, i.MX, TI, K3, Tegra,
Qoriq, Mediateka and QCom (Yangtao Li)
- Register as hwmon sensor for the Generic ADC (Chen-Yu Tsai)
- Use the dev_err_probe() function in the QCom tsens alarm driver
(Luca Weiss)"
* tag 'thermal-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits)
thermal/drivers/qcom/temp-alarm: Use dev_err_probe
thermal/drivers/generic-adc: Register thermal zones as hwmon sensors
thermal/drivers/mediatek/lvts_thermal: Remove redundant msg in lvts_ctrl_start()
thermal/drivers/qcom: Remove redundant msg at probe time
thermal/drivers/ti-soc: Remove redundant msg in ti_thermal_expose_sensor()
thermal/drivers/qoriq: Remove redundant msg in qoriq_tmu_register_tmu_zone()
thermal/drivers/tegra: Remove redundant msg in tegra_tsensor_register_channel()
drivers/thermal/k3: Remove redundant msg in k3_bandgap_probe()
thermal/drivers/imx: Remove redundant msg in imx8mm_tmu_probe() and imx_sc_thermal_probe()
thermal/drivers/amlogic: Remove redundant msg in amlogic_thermal_probe()
thermal/drivers/sun8i: Remove redundant msg in sun8i_ths_register()
thermal/hwmon: Add error information printing for devm_thermal_add_hwmon_sysfs()
thermal/drivers/stm32: Convert to platform remove callback returning void
net/mlx5: Update the driver with the recent thermal changes
thermal/drivers/armada: Add support for AP807 thermal data
dt-bindings: armada-thermal: Add armada-ap807-thermal compatible
thermal/drivers/qoriq: Support version 2.1
thermal/drivers/qoriq: Only enable supported sensors
thermal/drivers/qoriq: No need to program site adjustment register
thermal/drivers/mediatek/lvts_thermal: Register thermal zones as hwmon sensors
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add Intel TPMI (Topology Aware Register and PM Capsule
Interface) support to the power capping subsystem, extend the
intel_idle driver to work in VM guests where MWAIT is not available,
extend the system-wide power management diagnostics, fix bugs and
clean up code.
Specifics:
- Introduce power capping core support for Intel TPMI (Topology Aware
Register and PM Capsule Interface) and a TPMI interface driver for
Intel RAPL (Zhang Rui, Dan Carpenter)
- Fix CONFIG_IOSF_MBI dependency in the Intel RAPL power capping
driver (Zhang Rui)
- Fix invalid initialization for pl4_supported field in the Intel
RAPL power capping driver (Sumeet Pawnikar)
- Clean up the intel_idle driver, make it work with VM guests that
cannot use the MWAIT instruction and address the case in which the
host may enter a deep idle state when the guest is idle (Arjan van
de Ven)
- Prevent cpufreq drivers that provide the ->adjust_perf() callback
without a ->fast_switch() one which is used as a fallback from the
former in some cases (Wyes Karny)
- Fix some issues related to the AMD P-state cpufreq driver (Mario
Limonciello, Wyes Karny)
- Fix the energy_performance_preference attribute handling in the
intel_pstate driver in passive mode (Tero Kristo)
- Fix the handling of pm_suspend_target_state when CONFIG_PM is unset
(Kai-Heng Feng)
- Correct spelling mistake in a comment in the hibernation code (Wang
Honghui)
- Add arch_resume_nosmt() prototype to avoid a "missing prototypes"
build warning (Arnd Bergmann)
- Restrict pm_pr_dbg() to system-wide power transitions and use it in
a few additional places (Mario Limonciello)
- Drop verification of in-params from genpd_add_device() and ensure
that all of its callers will do it (Ulf Hansson)
- Prevent possible integer overflows from occurring in
genpd_parse_state() (Nikita Zhandarovich)
- Reorder fieldls in 'struct devfreq_dev_status' to reduce its size
somewhat (Christophe JAILLET)
- Ensure that the Exynos PPMU driver is already loaded before the
Exynos Bus driver starts probing so as to avoid a possible freeze
loading of the kernel modules (Marek Szyprowski)
- Fix variable deferencing before NULL check in the mtk-cci devfreq
driver (Sukrut Bellary)"
* tag 'pm-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (42 commits)
intel_idle: Add a "Long HLT" C1 state for the VM guest mode
cpufreq: intel_pstate: Fix energy_performance_preference for passive
cpufreq: amd-pstate: Add a kernel config option to set default mode
cpufreq: amd-pstate: Set a fallback policy based on preferred_profile
ACPI: CPPC: Add definition for undefined FADT preferred PM profile value
cpufreq: amd-pstate: Set default governor to schedutil
PM: domains: Move the verification of in-params from genpd_add_device()
cpufreq: amd-pstate: Make amd-pstate EPP driver name hyphenated
cpufreq: amd-pstate: Write CPPC enable bit per-socket
intel_idle: Add support for using intel_idle in a VM guest using just hlt
cpufreq: Fail driver register if it has adjust_perf without fast_switch
intel_idle: clean up the (new) state_update_enter_method function
intel_idle: refactor state->enter manipulation into its own function
platform/x86/amd: pmc: Use pm_pr_dbg() for suspend related messages
pinctrl: amd: Use pm_pr_dbg to show debugging messages
ACPI: x86: Add pm_debug_messages for LPS0 _DSM state tracking
include/linux/suspend.h: Only show pm_pr_dbg messages at suspend/resume
powercap: RAPL: Fix a NULL vs IS_ERR() bug
powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
powercap: RAPL: fix invalid initialization for pl4_supported field
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These rework the handling of notifications in ACPI button drivers (to
enable future simplifications and cleanups), clean up the ACPI thermal
driver, update the ACPI backlight driver, add quirks working around
AML bugs on some systems, fix some assorted issues and clean up code.
Specifics:
- Reduce ACPI device enumeration overhead related to devices with
dependencies (Rafael Wysocki)
- Fix the handling of Microsoft LPS0 _DSM for suspend-to-idle (Mario
Limonciello)
- Fix section mismatch warning in the ACPI suspend-to-idle code (Arnd
Bergmann)
- Drop several ACPI resource management quirks related to IRQ
ovverides on AMD "Zen" systems (Mario Limonciello)
- Modify the ACPI EC driver to make it only clear the EC GPE status
when handling the GPE (Jeremy Compostella)
- Add quirks to work around ACPI tables defects on Lenovo Yoga Book
yb1-x90f/l and Nextbook Ares 8A (Hans de Goede)
- Add ACPi backlight quirks for Dell Studio 1569, Lenovo ThinkPad
X131e (3371 AMD version) and Apple iMac11,3 and stop trying to use
vendor backlight control on relatively recent systems (Hans de
Goede)
- Add pwm_lookup_table entry for second PWM on CHT/BSW devices in the
ACPI LPSS (Intel SoC) driver (Hans de Goede)
- Add nfit_intel_shutdown_status() declaration to a local header to
avoid a "missing prototypes" build warning (Arnd Bergmann)
- Clean up the ACPI thermal driver and drop some dead or otherwise
unneded code from it (Rafael Wysocki)
- Rework the handling of notifications in the ACPI button drivers so
as to allow the common notification handling code for devices to be
simplified (Rafael Wysocki)
- Make ghes_get_devices() return NULL to indicate that there are no
GHES devices so as to allow vendor-specific EDAC drivers to probe
then (Li Yang)
- Mark bert_disable() as __initdata and drop an unused function from
the APEI GHES code (Miaohe Lin)
- Make the ACPI PAD (Processor Aggregator Device) driver realize that
Zhaoxin CPUs support nonstop TSC (Tony W Wang-oc)
- Drop the certainly unnecessary and likely incorrect inclusion of
linux/arm-smccc.h from acpi_ffh.c (Sudeep Holla)"
* tag 'acpi-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (30 commits)
ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569
ACPI: thermal: Drop struct acpi_thermal_flags
ACPI: thermal: Drop struct acpi_thermal_state
ACPI: bus: Simplify installation and removal of notify callback
ACPI: tiny-power-button: Eliminate the driver notify callback
ACPI: button: Use different notify handlers for lid and buttons
ACPI: button: Eliminate the driver notify callback
ACPI: thermal: Eliminate struct acpi_thermal_state_flags
ACPI: thermal: Move acpi_thermal_driver definition
ACPI: thermal: Move symbol definitions to one place
ACPI: thermal: Drop redundant ACPI_TRIPS_REFRESH_DEVICES symbol
ACPI: thermal: Use BIT() macro for defining flags
APEI: GHES: correctly return NULL for ghes_get_devices()
ACPI: FFH: Drop the inclusion of linux/arm-smccc.h
ACPI: PAD: mark Zhaoxin CPUs NONSTOP TSC correctly
ACPI: APEI: mark bert_disable as __initdata
ACPI: EC: Clear GPE on interrupt handling only
ACPI: video: Stop trying to use vendor backlight control on laptops from after ~2012
ACPI: x86: s2idle: Adjust Microsoft LPS0 _DSM handling sequence
ACPI: resource: Remove "Zen" specific match and quirks
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Notable features are user-space support for the memcpy/memset
instructions and the permission indirection extension.
- Support for the Armv8.9 Permission Indirection Extensions. While
this feature doesn't add new functionality, it enables future
support for Guarded Control Stacks (GCS) and Permission Overlays
- User-space support for the Armv8.8 memcpy/memset instructions
- arm64 perf: support the HiSilicon SoC uncore PMU, Arm CMN sysfs
identifier, support for the NXP i.MX9 SoC DDRC PMU, fixes and
cleanups
- Removal of superfluous ISBs on context switch (following
retrospective architecture tightening)
- Decode the ISS2 register during faults for additional information
to help with debugging
- KPTI clean-up/simplification of the trampoline exit code
- Addressing several -Wmissing-prototype warnings
- Kselftest improvements for signal handling and ptrace
- Fix TPIDR2_EL0 restoring on sigreturn
- Clean-up, robustness improvements of the module allocation code
- More sysreg conversions to the automatic register/bitfields
generation
- CPU capabilities handling cleanup
- Arm documentation updates: ACPI, ptdump"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (124 commits)
kselftest/arm64: Add a test case for TPIDR2 restore
arm64/signal: Restore TPIDR2 register rather than memory state
arm64: alternatives: make clean_dcache_range_nopatch() noinstr-safe
Documentation/arm64: Add ptdump documentation
arm64: hibernate: remove WARN_ON in save_processor_state
kselftest/arm64: Log signal code and address for unexpected signals
docs: perf: Fix warning from 'make htmldocs' in hisi-pmu.rst
arm64/fpsimd: Exit streaming mode when flushing tasks
docs: perf: Add new description for HiSilicon UC PMU
drivers/perf: hisi: Add support for HiSilicon UC PMU driver
drivers/perf: hisi: Add support for HiSilicon H60PA and PAv3 PMU driver
perf: arm_cspmu: Add missing MODULE_DEVICE_TABLE
perf/arm-cmn: Add sysfs identifier
perf/arm-cmn: Revamp model detection
perf/arm_dmc620: Add cpumask
arm64: mm: fix VA-range sanity check
arm64/mm: remove now-superfluous ISBs from TTBR writes
Documentation/arm64: Update ACPI tables from BBR
Documentation/arm64: Update references in arm-acpi
Documentation/arm64: Update ARM and arch reference
...
|
|
Pull ARM updates from Russell King:
- lots of build cleanups from Arnd spread throughout the arch/arm tree
- replace strlcpy() with the preferred strscpy()
- use sign_extend32() in the module linker
- drop handle_irq() machine descriptor method
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9315/1: fiq: include asm/mach/irq.h for prototypes
ARM: 9314/1: tcm: move tcm_init() prototype to asm/tcm.h
ARM: 9313/1: vdso: add missing prototypes
ARM: 9312/1: vfp: include asm/neon.h in vfpmodule.c
ARM: 9311/1: decompressor: move function prototypes to misc.h
ARM: 9310/1: xip-kernel: add __inflate_kernel_data prototype
ARM: 9309/1: add missing syscall prototypes
ARM: 9308/1: move setup functions to header
ARM: 9307/1: nommu: include asm/idmap.h
ARM: 9306/1: cacheflush: avoid __flush_anon_page() missing-prototype warning
ARM: 9305/1: add clear/copy_user_highpage declarations
ARM: 9304/1: add prototype for function called only from asm
ARM: 9303/1: kprobes: avoid missing-declaration warnings
ARM: 9302/1: traps: hide unused functions on NOMMU
ARM: 9301/1: dma-mapping: hide unused dma_contiguous_early_fixup function
ARM: 9300/1: Replace all non-returning strlcpy with strscpy
ARM: 9299/1: module: use sign_extend32() to extend the signedness
ARM: 9298/1: Drop custom mdesc->handle_irq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- miscellaneous NuBus fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: defconfig: Update defconfigs for v6.4-rc1
nubus: Don't list slot resources by default
nubus: Remove proc entries before adding them
nubus: Partially revert proc_create_single_data() conversion
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Dave Hansen:
"As usual, these are all over the map. The biggest cluster is work from
Arnd to eliminate -Wmissing-prototype warnings:
- Address -Wmissing-prototype warnings
- Remove repeated 'the' in comments
- Remove unused current_untag_mask()
- Document urgent tip branch timing
- Clean up MSR kernel-doc notation
- Clean up paravirt_ops doc
- Update Srivatsa S. Bhat's maintained areas
- Remove unused extern declaration acpi_copy_wakeup_routine()"
* tag 'x86_cleanups_for_6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
x86/acpi: Remove unused extern declaration acpi_copy_wakeup_routine()
Documentation: virt: Clean up paravirt_ops doc
x86/mm: Remove unused current_untag_mask()
x86/mm: Remove repeated word in comments
x86/lib/msr: Clean up kernel-doc notation
x86/platform: Avoid missing-prototype warnings for OLPC
x86/mm: Add early_memremap_pgprot_adjust() prototype
x86/usercopy: Include arch_wb_cache_pmem() declaration
x86/vdso: Include vdso/processor.h
x86/mce: Add copy_mc_fragile_handle_tail() prototype
x86/fbdev: Include asm/fb.h as needed
x86/hibernate: Declare global functions in suspend.h
x86/entry: Add do_SYSENTER_32() prototype
x86/quirks: Include linux/pnp.h for arch_pnpbios_disabled()
x86/mm: Include asm/numa.h for set_highmem_pages_init()
x86: Avoid missing-prototype warnings for doublefault code
x86/fpu: Include asm/fpu/regset.h
x86: Add dummy prototype for mk_early_pgtbl_32()
x86/pci: Mark local functions as 'static'
x86/ftrace: Move prepare_ftrace_return prototype to header
...
|
|
This was noticed by a user who noticied that the mtime of a file
backing a loopback device was getting bumped when the loopback device
is mounted read/only. Note: This doesn't show up when doing a
loopback mount of a file directly, via "mount -o ro /tmp/foo.img
/mnt", since the loop device is set read-only when mount automatically
creates loop device. However, this is noticeable for a LUKS loop
device like this:
% cryptsetup luksOpen /tmp/foo.img test
% mount -o ro /dev/loop0 /mnt ; umount /mnt
or, if LUKS is not in use, if the user manually creates the loop
device like this:
% losetup /dev/loop0 /tmp/foo.img
% mount -o ro /dev/loop0 /mnt ; umount /mnt
The modified mtime causes rsync to do a rolling checksum scan of the
file on the local and remote side, incrementally increasing the time
to rsync the not-modified-but-touched image file.
Fixes: eee00237fa5e ("ext4: commit super block if fs record error when journal record without error")
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/ZIauBR7YiV3rVAHL@glitch
Reported-by: Sean Greenslade <sean@seangreenslade.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We got a NULL pointer dereference issue below while running generic/475
I/O failure pressure test.
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP PTI
CPU: 1 PID: 15600 Comm: fsstress Not tainted 6.4.0-rc5-xfstests-00055-gd3ab1bca26b4 #190
RIP: 0010:jbd2_journal_set_features+0x13d/0x430
...
Call Trace:
<TASK>
? __die+0x23/0x60
? page_fault_oops+0xa4/0x170
? exc_page_fault+0x67/0x170
? asm_exc_page_fault+0x26/0x30
? jbd2_journal_set_features+0x13d/0x430
jbd2_journal_revoke+0x47/0x1e0
__ext4_forget+0xc3/0x1b0
ext4_free_blocks+0x214/0x2f0
ext4_free_branches+0xeb/0x270
ext4_ind_truncate+0x2bf/0x320
ext4_truncate+0x1e4/0x490
ext4_handle_inode_extension+0x1bd/0x2a0
? iomap_dio_complete+0xaf/0x1d0
The root cause is the journal super block had been failed to write out
due to I/O fault injection, it's uptodate bit was cleared by
end_buffer_write_sync() and didn't reset yet in jbd2_write_superblock().
And it raced by journal_get_superblock()->bh_read(), unfortunately, the
read IO is also failed, so the error handling in
journal_fail_superblock() unexpectedly clear the journal->j_sb_buffer,
finally lead to above NULL pointer dereference issue.
If the journal super block had been read and verified, there is no need
to call bh_read() read it again even if it has been failed to written
out. So the fix could be simply move buffer_verified(bh) in front of
bh_read(). Also remove a stale comment left in
jbd2_journal_check_used_features().
Fixes: 51bacdba23d8 ("jbd2: factor out journal initialization from journal_get_superblock()")
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230616015547.3155195-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Rename ext4_quota_off_umount() to ext4_quotas_off(), and add type
parameter to replace open code in ext4_enable_quotas().
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Yi found during a review of the patch "ext4: don't BUG on inconsistent
journal feature" that when ext4_mark_recovery_complete() returns an error
value, the error handling path does not turn off the enabled quotas,
which triggers the following kmemleak:
================================================================
unreferenced object 0xffff8cf68678e7c0 (size 64):
comm "mount", pid 746, jiffies 4294871231 (age 11.540s)
hex dump (first 32 bytes):
00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00 ............A...
c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00 ............H...
backtrace:
[<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880
[<00000000d4e621d7>] kmalloc_trace+0x39/0x140
[<00000000837eee74>] v2_read_file_info+0x18a/0x3a0
[<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770
[<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0
[<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4]
[<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4]
[<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4]
[<000000004a9489c4>] get_tree_bdev+0x1dc/0x370
[<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4]
[<00000000c7cb663d>] vfs_get_tree+0x31/0x160
[<00000000320e1bed>] do_new_mount+0x1d5/0x480
[<00000000c074654c>] path_mount+0x22e/0xbe0
[<0000000003e97a8e>] do_mount+0x95/0xc0
[<000000002f3d3736>] __x64_sys_mount+0xc4/0x160
[<0000000027d2140c>] do_syscall_64+0x3f/0x90
================================================================
To solve this problem, we add a "failed_mount10" tag, and call
ext4_quota_off_umount() in this tag to release the enabled qoutas.
Fixes: 11215630aada ("ext4: don't BUG on inconsistent journal feature")
Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Update the description in doc about the s_head parameter in the journal
superblock.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20230322013353.1843306-4-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Always enable 'JBD2_CYCLE_RECORD' journal option on ext4, letting the
jbd2 continue to record new journal transactions from the recovered
journal head or the checkpointed transactions in the previous mount.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230322013353.1843306-3-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
For a newly mounted file system, the journal committing thread always
record new transactions from the start of the journal area, no matter
whether the journal was clean or just has been recovered. So the logdump
code in debugfs cannot dump continuous logs between each mount, it is
disadvantageous to analysis corrupted file system image and locate the
file system inconsistency bugs.
If we get a corrupted file system in the running products and want to
find out what has happened, besides lookup the system log, one effective
way is to backtrack the journal log. But we may not always run e2fsck
before each mount and the default fsck -a mode also cannot always
checkout all inconsistencies, so it could left over some inconsistencies
into the next mount until we detect it. Finally, transactions in the
journal may probably discontinuous and some relatively new transactions
has been covered, it becomes hard to analyse. If we could record
transactions continuously between each mount, we could acquire more
useful info from the journal. Like this:
|Previous mount checkpointed/recovered logs|Current mount logs |
|{------}{---}{--------} ... {------}| ... |{======}{========}...000000|
And yes the journal area is limited and cannot record everything, the
problematic transaction may also be covered even if we do this, but
this is still useful for fuzzy tests and short-running products.
This patch save the head blocknr in the superblock after flushing the
journal or unmounting the file system, let the next mount could continue
to record new transaction behind it. This change is backward compatible
because the old kernel does not care about the head blocknr of the
journal. It is also fine if we mount a clean old image without valid
head blocknr, we fail back to set it to s_first just like before.
Finally, for the case of mount an unclean file system, we could also get
the journal head easily after scanning/replaying the journal, it will
continue to record new transaction after the recovered transactions.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230322013353.1843306-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
journal->j_format_version is no longer used, remove it.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-7-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Current journal_get_superblock() couple journal superblock checking and
partial journal initialization, factor out initialization part from it
to make things clear.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-6-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We should only check and set extented features if journal format version
is 2, and now we check the in memory copy of the superblock
'journal->j_format_version', which relys on the parameter initialization
sequence, switch to use the h_blocktype in superblock cloud be more
clear.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-5-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
JBD2_HAS_[IN|RO_]COMPAT_FEATURE macros are no longer used, just remove
them.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-4-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always
become true if sbi->s_journal_bdev exists. Filesystem block device and
journal block device are both opened with 'FMODE_EXCL' mode, so these
two devices can't be same one. Then we can remove the redundant checking
'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.
[1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-3-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
ext4_fill_super
| ext4_load_and_init_journal
| ext4_load_journal
| jbd2_journal_load
| load_superblock
| journal_get_superblock
| set_buffer_verified(bh) // buffer head is verified
| jbd2_journal_recover // failed caused by EIO
| goto failed_mount3a // skip 'sb->s_root' initialization
deactivate_locked_super
kill_block_super
generic_shutdown_super
if (sb->s_root)
// false, skip ext4_put_super->invalidate_bdev->
// invalidate_mapping_pages->mapping_evict_folio->
// filemap_release_folio->try_to_free_buffers, which
// cannot drop buffer head.
blkdev_put
blkdev_put_whole
if (atomic_dec_and_test(&bdev->bd_openers))
// false, systemd-udev happens to open the device. Then
// blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
// truncate_inode_folio->truncate_cleanup_folio->
// folio_invalidate->block_invalidate_folio->
// filemap_release_folio->try_to_free_buffers will be skipped,
// dropping buffer head is missed again.
Second mount:
ext4_fill_super
ext4_load_and_init_journal
ext4_load_journal
ext4_get_journal
jbd2_journal_init_inode
journal_init_common
bh = getblk_unmovable
bh = __find_get_block // Found stale bh in last failed mounting
journal->j_sb_buffer = bh
jbd2_journal_load
load_superblock
journal_get_superblock
if (buffer_verified(bh))
// true, skip journal->j_format_version = 2, value is 0
jbd2_journal_recover
do_one_pass
next_log_block += count_tags(journal, bh)
// According to journal_tag_bytes(), 'tag_bytes' calculating is
// affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
// returns false because 'j->j_format_version >= 2' is not true,
// then we get wrong next_log_block. The do_one_pass may exit
// early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.
The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes: 25ed6e8a54df ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230315013128.3911115-2-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
We've had reports of significant performance regression of sub-block
(unaligned) direct writes due to the added exclusivity restrictions
in ext4. The purpose of the exclusivity requirement for unaligned
direct writes is to avoid data corruption caused by unserialized
partial block zeroing in the iomap dio layer across overlapping
writes.
XFS has similar requirements for the same underlying reasons, yet
doesn't suffer the extreme performance regression that ext4 does.
The reason for this is that XFS utilizes IOMAP_DIO_OVERWRITE_ONLY
mode, which allows for optimistic submission of concurrent unaligned
I/O and kicks back writes that require partial block zeroing such
that they can be submitted in a safe, exclusive context. Since ext4
already performs most of these checks pre-submission, it can support
something similar without necessarily relying on the iomap flag and
associated retry mechanism.
Update the dio write submission path to allow concurrent submission
of unaligned direct writes that are purely overwrite and so will not
require block zeroing. To improve readability of the various related
checks, move the unaligned I/O handling down into
ext4_dio_write_checks(), where the dio draining and force wait logic
can immediately follow the locking requirement checks. Finally, the
IOMAP_DIO_OVERWRITE_ONLY flag is set to enable a warning check as a
precaution should the ext4 overwrite logic ever become inconsistent
with the zeroing expectations of iomap dio.
The performance improvement of sub-block direct write I/O is shown
in the following fio test on a 64xcpu guest vm:
Test: fio --name=test --ioengine=libaio --direct=1 --group_reporting
--overwrite=1 --thread --size=10G --filename=/mnt/fio
--readwrite=write --ramp_time=10s --runtime=60s --numjobs=8
--blocksize=2k --iodepth=256 --allow_file_create=0
v6.2: write: IOPS=4328, BW=8724KiB/s
v6.2 (patched): write: IOPS=801k, BW=1565MiB/s
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230314130759.642710-1-bfoster@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Line wrap and slightly clarify the comments describing mballoc's
cirtiera.
Define EXT4_MB_NUM_CRS as part of the enum, so that it will
automatically get updated when criteria is added or removed.
Also fix a potential unitialized use of 'cr' variable if
CONFIG_EXT4_DEBUG is enabled.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
After ext4_es_insert_extent() returns void, the return value in
ext4_zeroout_es() is also unnecessary, so make it return void too.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-13-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now ext4_es_insert_extent() never return error, so make it return void.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-12-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now it never fails when inserting a delay extent, so the return value in
ext4_es_insert_delayed_block is no longer necessary, let it return void.
[ Fixed bug which caused system hangs during bigalloc test runs. See
https://lore.kernel.org/r/20230612030405.GH1436857@mit.edu for more
details. -- TYT ]
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-11-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Now ext4_es_remove_extent() never fails, so make it return void.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-10-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Similar to in ext4_es_insert_delayed_block(), we use preallocations that
do not fail to avoid inconsistencies, but we do not care about es that are
not must be kept, and we return 0 even if such es memory allocation fails.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-9-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
Similar to in ext4_es_remove_extent(), we use a no-fail preallocation
to avoid inconsistencies, except that here we may have to preallocate
two extent_status.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-8-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
If __es_remove_extent() returns an error it means that when splitting
extent, allocating an extent that must be kept failed, where returning
an error directly would cause the extent tree to be inconsistent. So we
use GFP_NOFAIL to pre-allocate an extent_status and pass it to
__es_remove_extent() to avoid this problem.
In addition, since the allocated memory is outside the i_es_lock, the
extent_status tree may change and the pre-allocated extent_status is
no longer needed, so we release the pre-allocated extent_status when
es->es_len is not initialized.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-7-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
When splitting extent, if the second extent can not be dropped, we return
-ENOMEM and use GFP_NOFAIL to preallocate an extent_status outside of
i_es_lock and pass it to __es_remove_extent() to be used as the second
extent. This ensures that __es_remove_extent() is executed successfully,
thus ensuring consistency in the extent status tree. If the second extent
is not undroppable, we simply drop it and return 0. Then retry is no longer
necessary, remove it.
Now, __es_remove_extent() will always remove what it should, maybe more.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-6-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|