| Age | Commit message (Collapse) | Author |
|
In commit 73b3294b1152 ("mm: simplify folio_page() and folio_page_idx()")
we converted folio_page() into a static inline function. However briefly
afterwards in commit a847b17009ec ("mm: constify highmem related functions
for improved const-correctness") we had to add some nasty const-away
casting to make the compiler happy when checking const correctness.
So let's just convert it back to a simple macro so the compiler can check
const correctness properly. There is the alternative of using a
_Generic() similar to page_folio(), but there is not a lot of benefit
compared to just using a simple macro.
Link: https://lkml.kernel.org/r/20250923140058.2020023-1-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
KCSAN reports a data race on mm_cluster.hiwater_rss, which can be accessed
concurrently from various paths like page migration and memory unmapping
without synchronization.
Since hiwater_rss is a statistical field for accounting purposes, this
data race is benign. Annotate both the read and write accesses with
data_race() to make KCSAN happy.
Link: https://lkml.kernel.org/r/20250926092426.43312-1-lance.yang@linux.dev
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Reported-by: syzbot+60192c8877d0bc92a92b@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/68d6364e.050a0220.3390a8.000d.GAE@google.com
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Marco Elver <elver@google.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/ksm: Fix incorrect accounting of KSM counters during
fork", v3.
The first patch in this series fixes the incorrect accounting of KSM
counters such as ksm_merging_pages, ksm_rmap_items, and the global
ksm_zero_pages during fork.
The following patch add a selftest to verify the ksm_merging_pages counter
was updated correctly during fork.
Test Results
============
Without the first patch
-----------------------
# [RUN] test_fork_ksm_merging_page_count
not ok 10 ksm_merging_page in child: 32
With the first patch
--------------------
# [RUN] test_fork_ksm_merging_page_count
ok 10 ksm_merging_pages is not inherited after fork
This patch (of 2):
Currently, the KSM-related counters in `mm_struct`, such as
`ksm_merging_pages`, `ksm_rmap_items`, and `ksm_zero_pages`, are inherited
by the child process during fork. This results in inconsistent
accounting.
When a process uses KSM, identical pages are merged and an rmap item is
created for each merged page. The `ksm_merging_pages` and
`ksm_rmap_items` counters are updated accordingly. However, after a fork,
these counters are copied to the child while the corresponding rmap items
are not. As a result, when the child later triggers an unmerge, there are
no rmap items present in the child, so the counters remain stale, leading
to incorrect accounting.
A similar issue exists with `ksm_zero_pages`, which maintains both a
global counter and a per-process counter. During fork, the per-process
counter is inherited by the child, but the global counter is not
incremented. Since the child also references zero pages, the global
counter should be updated as well. Otherwise, during zero-page unmerge,
both the global and per-process counters are decremented, causing the
global counter to become inconsistent.
To fix this, ksm_merging_pages and ksm_rmap_items are reset to 0 during
fork, and the global ksm_zero_pages counter is updated with the
per-process ksm_zero_pages value inherited by the child. This ensures
that KSM statistics remain accurate and reflect the activity of each
process correctly.
Link: https://lkml.kernel.org/r/cover.1758648700.git.donettom@linux.ibm.com
Link: https://lkml.kernel.org/r/7b9870eb67ccc0d79593940d9dbd4a0b39b5d396.1758648700.git.donettom@linux.ibm.com
Fixes: 7609385337a4 ("ksm: count ksm merging pages for each process")
Fixes: cb4df4cae4f2 ("ksm: count allocated ksm rmap_items for each process")
Fixes: e2942062e01d ("ksm: count all zero pages placed by KSM")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Aboorva Devarajan <aboorvad@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: xu xin <xu.xin16@zte.com.cn>
Cc: <stable@vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: Improve mlock tracking for large folios", v3.
The patchset includes several fixes and improvements related to mlock
tracking of large folios.
The main objective is to reduce the undercount of Mlocked memory in
/proc/meminfo and improve the accuracy of the statistics.
Patches 1-2:
These patches address a minor race condition in folio_referenced_one()
related to mlock_vma_folio().
Currently, mlock_vma_folio() is called on large folio without the page
table lock, which can result in a race condition with unmap (i.e.
MADV_DONTNEED). This can lead to partially mapped folios on the
unevictable LRU list.
While not a significant issue, I do not believe backporting is necessary.
Patch 3:
This patch adds mlocking logic similar to folio_referenced_one() to
try_to_unmap_one(), allowing for mlocking of large folios where possible.
Patch 4-5:
These patches modifies finish_fault() and faultaround to map in the entire
folio when possible, enabling efficient mlocking upon addition to the
rmap.
Patch 6:
This patch makes rmap mlock large folios if they are fully mapped,
addressing the primary source of mlock undercount for large folios.
This patch (of 6):
Add a PVMW_PGTABLE_CROSSSED flag that page_vma_mapped_walk() will set if
the page is mapped across page table boundary. Unlike other PVMW_* flags,
this one is result of page_vma_mapped_walk() and not set by the caller.
folio_referenced_one() will use it to detect if it safe to mlock the
folio.
[akpm@linux-foundation.org: s/CROSSSED/CROSSED/]
Link: https://lkml.kernel.org/r/20250923110711.690639-1-kirill@shutemov.name
Link: https://lkml.kernel.org/r/20250923110711.690639-2-kirill@shutemov.name
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"7 hotfixes. 4 are cc:stable and the remainder address post-6.16 issues
or aren't considered necessary for -stable kernels. 6 of these fixes
are for MM.
All singletons, please see the changelogs for details"
* tag 'mm-hotfixes-stable-2025-09-27-22-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
include/linux/pgtable.h: convert arch_enter_lazy_mmu_mode() and friends to static inlines
mm/damon/sysfs: do not ignore callback's return value in damon_sysfs_damon_call()
mailmap: add entry for Bence Csókás
fs/proc/task_mmu: check p->vec_buf for NULL
kmsan: fix out-of-bounds access to shadow memory
mm/hugetlb: fix copy_hugetlb_page_range() to use ->pt_share_count
mm/hugetlb: fix folio is still mapped when deleted
|
|
Add interface definitions for load balance ID and LAG per multiplane group
functionality. This patch introduces the hardware capability bits needed
to support balance ID in multiplane LAG configurations.
The new fields include:
- load_balance_id: 4-bit field for balance identifier.
- lag_per_mp_group: capability bit for LAG per multiplane group support.
These interface additions are prerequisites for implementing balance ID
support in the MLX5 driver.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758521191-814350-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Before this cap, firmware requested a certain creation order between TIR
objects and SQs of the same transport domain to properly support the
self loopback prevention feature. If order is not preserved, explicit
modify_tir operations are necessary after the opening of the SQs.
When set, this cap bit indicates that this firmware requirement /
limitation no longer holds.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1758521191-814350-2-git-send-email-tariqt@nvidia.com
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
There are very few places that have cause to do that - all in core
VFS now, and all done to files that are not yet opened (or visible
to anybody else, for that matter).
Let's turn f_path into a union of struct path __f_path and const
struct path f_path. It's C, not C++ - 6.5.2.3[4] in C99 and
later explicitly allows that kind of type-punning.
That way any attempts to bypass these checks will be either very
easy to catch, or (if the bastards get sufficiently creative to
make it hard to spot with grep alone) very clearly malicious -
and still catchable with a bit of instrumentation for sparse.
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow
i2c-host for v6.18
- Add support for MediaTek MT6878 I2C
- Drop support for S3C2410
|
|
Yinhao et al. recently reported:
Our fuzzer tool discovered an uninitialized pointer issue in the
bpf_prog_test_run_xdp() function within the Linux kernel's BPF subsystem.
This leads to a NULL pointer dereference when a BPF program attempts to
deference the txq member of struct xdp_buff object.
The test initializes two programs of BPF_PROG_TYPE_XDP: progA acts as the
entry point for bpf_prog_test_run_xdp() and its expected_attach_type can
neither be of be BPF_XDP_DEVMAP nor BPF_XDP_CPUMAP. progA calls into a slot
of a tailcall map it owns. progB's expected_attach_type must be BPF_XDP_DEVMAP
to pass xdp_is_valid_access() validation. The program returns struct xdp_md's
egress_ifindex, and the latter is only allowed to be accessed under mentioned
expected_attach_type. progB is then inserted into the tailcall which progA
calls.
The underlying issue goes beyond XDP though. Another example are programs
of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. sock_addr_is_valid_access() as well
as sock_addr_func_proto() have different logic depending on the programs'
expected_attach_type. Similarly, a program attached to BPF_CGROUP_INET4_GETPEERNAME
should not be allowed doing a tailcall into a program which calls bpf_bind()
out of BPF which is only enabled for BPF_CGROUP_INET4_CONNECT.
In short, specifying expected_attach_type allows to open up additional
functionality or restrictions beyond what the basic bpf_prog_type enables.
The use of tailcalls must not violate these constraints. Fix it by enforcing
expected_attach_type in __bpf_prog_map_compatible().
Note that we only enforce this for tailcall maps, but not for BPF devmaps or
cpumaps: There, the programs are invoked through dev_map_bpf_prog_run*() and
cpu_map_bpf_prog_run*() which set up a new environment / context and therefore
these situations are not prone to this issue.
Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20250926171201.188490-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
APIs based on __pm_runtime_idle() (pm_runtime_idle(), pm_request_idle())
do not return 1 when already suspended. They return -EAGAIN. This is
already covered in the docs, so the entry for "1" is redundant and
conflicting.
(pm_runtime_put() and pm_runtime_put_sync() were previously incorrect,
but that's fixed in "PM: runtime: pm_runtime_put{,_sync}() returns 1
when already suspended", to ensure consistency with APIs like
pm_runtime_put_autosuspend().)
RPM_GET_PUT APIs based on __pm_runtime_suspend() do return 1 when
already suspended, but the language is a little unclear -- it's not
really an "error", so it seems better to list as a clarification before
the 0/success case. Additionally, they only actually return 1 when the
refcount makes it to 0; if the usage counter is still non-zero, we
return 0.
pm_runtime_put(), etc., also don't appear at first like they can ever
see "-EAGAIN: Runtime PM usage_count non-zero", because in non-racy
conditions, pm_runtime_put() would drop its reference count, see it's
non-zero, and return early (in __pm_runtime_idle()). However, it's
possible to race with another actor that increments the usage_count
afterward, since rpm_idle() is protected by a separate lock; in such a
case, we may see -EAGAIN.
Because this case is only seen in the presence of concurrent actors, it
makes sense to clarify that this is when "usage_count **became**
non-zero", by way of some racing actor.
Lastly, pm_runtime_put_sync_suspend() duplicated some -EAGAIN language.
Fix that.
Fixes: 271ff96d6066 ("PM: runtime: Document return values of suspend-related API functions")
Link: https://lore.kernel.org/linux-pm/aJ5pkEJuixTaybV4@google.com/
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
./include/linux/bpf.h: crypto/sha2.h is included more than once.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=25501
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20250926095240.3397539-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce a NPU callback to initialize flow stats and remove NPU stats
initialization from airoha_npu_get routine. Add num_stats_entries to
airoha_npu_ppe_stats_setup routine.
This patch makes the code more readable since NPU statistic are now
initialized on demand by the NPU consumer (at the moment NPU statistic
are configured just by the airoha_eth driver).
Moreover this patch allows the NPU consumer (PPE module) to explicitly
enable/disable NPU flow stats.
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250924-airoha-npu-init-stats-callback-v1-1-88bdf3c941b2@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
IEEE 802.3ck-2022 defines counters for FEC bins and 802.3df-2024
clarifies it a bit further. Implement reporting interface through as
addition to FEC stats available in ethtool. Drivers can leave bin
counter uninitialized if per-lane values are provided. In this case the
core will recalculate summ for the bin.
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Link: https://patch.msgid.link/20250924124037.1508846-2-vadim.fedorenko@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We have some rather subtle code around zeroing tail entries, minimizing
cache bouncing. Let's put it all in one place.
Doing this also reduces the text size slightly, e.g. for
drivers/vhost/net.o
Before: text: 15,114 bytes
After: text: 15,082 bytes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://patch.msgid.link/adb9d941de4a2b619ddb2be271a9939849e70687.1758690291.git.mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2025-09-25
this is a pull request of 48 patches for net-next/main, which
supersedes tags/linux-can-next-for-6.18-20250923.
The 1st patch is by Xichao Zhao and converts ns_to_ktime() to
us_to_ktime() in the m_can driver.
Vincent Mailhol contributes 2 patches: Updating the MAINTAINERS and
mailmap files to Vincent's new email address and sorting the includes
in the CAN helper library alphabeticaly.
Stéphane Grosjean's patch modifies all peak CAN drivers and the
mailmap to reflect Stéphane's new email address.
4 patches by Biju Das update the CAN-FD handling in the rcar_canfd
driver.
Followed by 11 patches by Geert Uytterhoeven updating and improving
the rcar_can driver.
Stefan Mätje contributes 2 patches for the esd_usb driver updating the
error messages.
The next 3 patch series are all by Vincent Mailhol: 3 patches to
optimize the size of struct raw_sock and struct uniqframe. 4 patches
which rework the CAN MTU logic as preparation for CAN-XL interfaces.
And finally 20 patches that prepare and refactor the CAN netlink code
for the upcoming CAN-XL support.
* tag 'linux-can-next-for-6.18-20250924' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (48 commits)
can: netlink: add userland error messages
can: dev: add can_get_ctrlmode_str()
can: calc_bittiming: make can_calc_tdco() FD agnostic
can: netlink: make can_tdc_fill_info() FD agnostic
can: netlink: add can_bitrate_const_fill_info()
can: netlink: add can_bittiming_const_fill_info()
can: netlink: add can_bittiming_fill_info()
can: netlink: add can_data_bittiming_get_size()
can: netlink: make can_tdc_get_size() FD agnostic
can: netlink: add can_ctrlmode_changelink()
can: netlink: add can_dtb_changelink()
can: netlink: make can_tdc_changelink() FD agnostic
can: netlink: remove useless check in can_tdc_changelink()
can: netlink: refactor CAN_CTRLMODE_TDC_{AUTO,MANUAL} flag reset logic
can: netlink: add can_validate_databittiming()
can: netlink: add can_validate_tdc()
can: netlink: refactor can_validate_bittiming()
can: netlink: document which symbols are FD specific
can: dev: make can_get_relative_tdco() FD agnostic and move it to bittiming.h
can: dev: move struct data_bittiming_params to linux/can/bittiming.h
...
====================
Link: https://patch.msgid.link/20250925121332.848157-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Quite a bit more things, including pull requests from drivers:
- mt76: MLO support, HW restart improvements
- rtw88/89: small features, prep for RTL8922DE support
- ath10k: GTK rekey fixes
- cfg80211/mac80211:
- additions for more NAN support
- S1G channel representation cleanup
* tag 'wireless-next-2025-09-25' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (167 commits)
wifi: libertas: add WQ_UNBOUND to alloc_workqueue users
Revert "wifi: libertas: WQ_PERCPU added to alloc_workqueue users"
wifi: libertas: WQ_PERCPU added to alloc_workqueue users
wifi: cfg80211: fix width unit in cfg80211_radio_chandef_valid()
wifi: ath11k: HAL SRNG: don't deinitialize and re-initialize again
wifi: ath12k: enforce CPU endian format for all QMI data
wifi: ath12k: Use 1KB Cache Flush Command for QoS TID Descriptors
wifi: ath12k: Fix flush cache failure during RX queue update
wifi: ath12k: Add Retry Mechanism for REO RX Queue Update Failures
wifi: ath12k: Refactor REO command to use ath12k_dp_rx_tid_rxq
wifi: ath12k: Refactor RX TID buffer cleanup into helper function
wifi: ath12k: Refactor RX TID deletion handling into helper function
wifi: ath12k: Increase DP_REO_CMD_RING_SIZE to 256
wifi: cfg80211: remove IEEE80211_CHAN_{1,2,4,8,16}MHZ flags
wifi: rtw89: avoid circular locking dependency in ser_state_run()
wifi: rtw89: fix leak in rtw89_core_send_nullfunc()
wifi: rtw89: avoid possible TX wait initialization race
wifi: rtw89: fix use-after-free in rtw89_core_tx_kick_off_and_wait()
wifi: ath12k: Fix peer lookup in ath12k_dp_mon_rx_deliver_msdu()
wifi: mac80211: fix Rx packet handling when pubsta information is not available
...
====================
Link: https://patch.msgid.link/20250925232341.4544-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
1fcc67e3a354 ("of: base: Add for_each_child_of_node_with_prefix()") added
of_get_next_child_with_prefix() but did not add a stub for the !CONFIG_OF
case.
Add a of_get_next_child_with_prefix() stub so users of
for_each_child_of_node_with_prefix() can be built for compile testing even
when !CONFIG_OF.
Fixes: 1fcc67e3a354 ("of: base: Add for_each_child_of_node_with_prefix()")
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix two dl_server regressions: a race that can end up leaving the
dl_server stuck, and a dl_server throttling bug causing lag to fair
tasks"
* tag 'sched-urgent-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Fix dl_server behaviour
sched/deadline: Fix dl_server getting stuck
|
|
Commit 495c8d35035e ("PM: hibernate: Add pm_hibernation_mode_is_suspend()")
that introduced pm_hibernation_mode_is_suspend() did not define it in
the case when CONFIG_HIBERNATION is unset, but CONFIG_SUSPEND is set.
Subsequent commit 0a6e9e098fcc ("drm/amd: Fix hybrid sleep") made the
amdgpu driver use that function which led to kernel build breakage in
the case mentioned above [1].
Address this by using appropriate #ifdeffery around the definition of
pm_hibernation_mode_is_suspend().
Fixes: 0a6e9e098fcc ("drm/amd: Fix hybrid sleep")
Reported-by: KernelCI bot <bot@kernelci.org>
Closes: https://groups.io/g/kernelci-results/topic/regression_pm_testing/115439919 [1]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
|
|
Specifying a non-zero value for a new struct kmem_cache_args field
sheaf_capacity will setup a caching layer of percpu arrays called
sheaves of given capacity for the created cache.
Allocations from the cache will allocate via the percpu sheaves (main or
spare) as long as they have no NUMA node preference. Frees will also
put the object back into one of the sheaves.
When both percpu sheaves are found empty during an allocation, an empty
sheaf may be replaced with a full one from the per-node barn. If none
are available and the allocation is allowed to block, an empty sheaf is
refilled from slab(s) by an internal bulk alloc operation. When both
percpu sheaves are full during freeing, the barn can replace a full one
with an empty one, unless over a full sheaves limit. In that case a
sheaf is flushed to slab(s) by an internal bulk free operation. Flushing
sheaves and barns is also wired to the existing cpu flushing and cache
shrinking operations.
The sheaves do not distinguish NUMA locality of the cached objects. If
an allocation is requested with kmem_cache_alloc_node() (or a mempolicy
with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE),
the sheaves are bypassed.
The bulk operations exposed to slab users also try to utilize the
sheaves as long as the necessary (full or empty) sheaves are available
on the cpu or in the barn. Once depleted, they will fallback to bulk
alloc/free to slabs directly to avoid double copying.
The sheaf_capacity value is exported in sysfs for observability.
Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf
count objects allocated or freed using the sheaves (and thus not
counting towards the other alloc/free path counters). Counters
sheaf_refill and sheaf_flush count objects filled or flushed from or to
slab pages, and can be used to assess how effective the caching is. The
refill and flush operations will also count towards the usual
alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for
the backing slabs. For barn operations, barn_get and barn_put count how
many full sheaves were get from or put to the barn, the _fail variants
count how many such requests could not be satisfied mainly because the
barn was either empty or full. While the barn also holds empty sheaves
to make some operations easier, these are not as critical to mandate own
counters. Finally, there are sheaf_alloc/sheaf_free counters.
Access to the percpu sheaves is protected by local_trylock() when
potential callers include irq context, and local_lock() otherwise (such
as when we already know the gfp flags allow blocking). The trylock
failures should be rare and we can easily fallback. Each per-NUMA-node
barn has a spin_lock.
When slub_debug is enabled for a cache with sheaf_capacity also
specified, the latter is ignored so that allocations and frees reach the
slow path where debugging hooks are processed. Similarly, we ignore it
with CONFIG_SLUB_TINY which prefers low memory usage to performance.
[boot failure: https://lore.kernel.org/all/583eacf5-c971-451a-9f76-fed0e341b815@linux.ibm.com/ ]
Reported-and-tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
lockdep_is_held() macro assumes that "struct lockdep_map dep_map;"
is a top level field of any lock that participates in LOCKDEP.
Make it so for local_trylock_t.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
'amd/amd-vi' into next
|
|
The stacktrace map can be easily full, which will lead to failure in
obtaining the stack. In addition to increasing the size of the map,
another solution is to delete the stack_id after looking it up from
the user, so extend the existing bpf_map_lookup_and_delete_elem()
functionality to stacktrace map types.
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250925175030.1615837-1-chen.dylane@linux.dev
|
|
static inlines
commit c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards") introduced
the use of arch_enter_lazy_mmu_mode(), which results in the compiler
complaining about "statement has no effect", when
__HAVE_ARCH_LAZY_MMU_MODE is not defined in include/linux/pgtable.h
The exact warning/error is:
In file included from ./include/linux/kasan.h:37,
from mm/kasan/shadow.c:14:
mm/kasan/shadow.c: In function kasan_populate_vmalloc_pte:
./include/linux/pgtable.h:247:41: error: statement with no effect [-Werror=unused-value]
247 | #define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT)
| ^
mm/kasan/shadow.c:322:9: note: in expansion of macro arch_enter_lazy_mmu_mode> 322 | arch_enter_lazy_mmu_mode();
| ^~~~~~~~~~~~~~~~~~~~~~~~
switching these "functions" to static inlines fixes this up.
Fixes: c519c3c0a113 ("mm/kasan: avoid lazy MMU mode hazards")
Reported-by: Balbir Singh <balbirs@nvidia.com>
Closes: https://lkml.kernel.org/r/20250912235515.367061-1-balbirs@nvidia.com
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
commit 59d9094df3d79 ("mm: hugetlb: independent PMD page table shared
count") introduced ->pt_share_count dedicated to hugetlb PMD share count
tracking, but omitted fixing copy_hugetlb_page_range(), leaving the
function relying on page_count() for tracking that no longer works.
When lazy page table copy for hugetlb is disabled, that is, revert commit
bcd51a3c679d ("hugetlb: lazy page table copies in fork()") fork()'ing with
hugetlb PMD sharing quickly lockup -
[ 239.446559] watchdog: BUG: soft lockup - CPU#75 stuck for 27s!
[ 239.446611] RIP: 0010:native_queued_spin_lock_slowpath+0x7e/0x2e0
[ 239.446631] Call Trace:
[ 239.446633] <TASK>
[ 239.446636] _raw_spin_lock+0x3f/0x60
[ 239.446639] copy_hugetlb_page_range+0x258/0xb50
[ 239.446645] copy_page_range+0x22b/0x2c0
[ 239.446651] dup_mmap+0x3e2/0x770
[ 239.446654] dup_mm.constprop.0+0x5e/0x230
[ 239.446657] copy_process+0xd17/0x1760
[ 239.446660] kernel_clone+0xc0/0x3e0
[ 239.446661] __do_sys_clone+0x65/0xa0
[ 239.446664] do_syscall_64+0x82/0x930
[ 239.446668] ? count_memcg_events+0xd2/0x190
[ 239.446671] ? syscall_trace_enter+0x14e/0x1f0
[ 239.446676] ? syscall_exit_work+0x118/0x150
[ 239.446677] ? arch_exit_to_user_mode_prepare.constprop.0+0x9/0xb0
[ 239.446681] ? clear_bhb_loop+0x30/0x80
[ 239.446684] ? clear_bhb_loop+0x30/0x80
[ 239.446686] entry_SYSCALL_64_after_hwframe+0x76/0x7e
There are two options to resolve the potential latent issue:
1. warn against PMD sharing in copy_hugetlb_page_range(),
2. fix it.
This patch opts for the second option.
While at it, simplify the comment, the details are not actually relevant
anymore.
Link: https://lkml.kernel.org/r/20250916004520.1604530-1-jane.chu@oracle.com
Fixes: 59d9094df3d7 ("mm: hugetlb: independent PMD page table shared count")
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Muchun Song <muchun.song@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Annotate two places in boardinfo code:
- __i2c_first_dynamic_bus_num is set in init phase. Annotate it as
__ro_after_init to prevent later changes.
- i2c_register_board_info() is used in init phase only, so annotate it
as __init, allowing to free the memory after init phase.
This is safe, see comment: "done in board-specific init code near
arch_initcall() time"
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
|
|
The RPMI specification defines a system MSI service group which
allows application processors to receive MSIs upon system events
such as graceful shutdown/reboot request, CPU hotplug event, memory
hotplug event, etc.
Add an irqchip driver for the RISC-V RPMI system MSI service group
to directly receive system MSIs in Linux kernel.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-14-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Some drivers have different flows for hibernation and suspend. If
the driver opportunistically will skip thaw() then it needs a hint
to know what is happening after the hibernate.
Introduce a new symbol pm_hibernation_mode_is_suspend() that drivers
can call to determine if suspending the system for this purpose.
Tested-by: Ionut Nechita <ionut_n2001@yahoo.com>
Tested-by: Kenneth Crudup <kenny@panix.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The RPMI specification defines a clock service group which can be
accessed via SBI MPXY extension or dedicated S-mode RPMI transport.
Add mailbox client based clock driver for the RISC-V RPMI clock
service group.
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Co-developed-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Signed-off-by: Rahul Pathak <rpathak@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-11-apatel@ventanamicro.com
[pjw@kernel.org: converted rpmi_clkrate_u64 macro to a function; replaced bare constant with a macro]
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Cross-merge networking fixes after downstream PR (net-6.17-rc8).
Conflicts:
drivers/net/can/spi/hi311x.c
6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled")
27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users")
https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org
net/smc/smc_loopback.c
drivers/dibs/dibs_loopback.c
a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()")
cc21191b584c ("dibs: Move data path to dibs layer")
https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org
Adjacent changes:
drivers/net/dsa/lantiq/lantiq_gswip.c
c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()")
7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-next
Mika writes:
thunderbolt: Changes for v6.18 merge window
This includes following USB4/Thunderbolt changes for the v6.18 merge
window:
- HMAC hashing improvements
- Switch to use Linux Foundation IDs for XDomain discovery
- Use is_pciehp instead of is_hotplug_bridge
- Fixes for various kernel-doc issues
- Fix use-after-free in DP tunneling error path.
I'm sending the UAF fix with this pull request because it came quite
late and I would like to give it some exposure before it lands the
mainline.
All these except the UAF fix have been in linux-next with no reported
issues.
* tag 'thunderbolt-for-v6.18-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (33 commits)
thunderbolt: Fix use-after-free in tb_dp_dprx_work
thunderbolt: Update thunderbolt.h header file
thunderbolt: Update xdomain.c function documentation
thunderbolt: Update usb4_port.c function documentation
thunderbolt: Update usb4.c function documentation
thunderbolt: Update tunnel.h function documentation
thunderbolt: Update tunnel.c function documentation
thunderbolt: Update tmu.c function documentation
thunderbolt: Add missing documentation in tb.h
thunderbolt: Update tb.h function documentation
thunderbolt: Update tb.c function documentation
thunderbolt: Update switch.c function documentation
thunderbolt: Update retimer.c function documentation
thunderbolt: Update property.c function documentation
thunderbolt: Update path.c function documentation
thunderbolt: Update nvm.c function documentation
thunderbolt: Add missing documentation in nhi_regs.h ring_desc structure
thunderbolt: Update nhi.c function documentation
thunderbolt: Update lc.c function documentation
thunderbolt: Update eeprom.c function documentation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth, IPsec and CAN.
No known regressions at this point.
Current release - regressions:
- xfrm: xfrm_alloc_spi shouldn't use 0 as SPI
Previous releases - regressions:
- xfrm: fix offloading of cross-family tunnels
- bluetooth: fix several races leading to UaFs
- dsa: lantiq_gswip: fix FDB entries creation for the CPU port
- eth:
- tun: update napi->skb after XDP process
- mlx: fix UAF in flow counter release
Previous releases - always broken:
- core: forbid FDB status change while nexthop is in a group
- smc: fix warning in smc_rx_splice() when calling get_page()
- can: provide missing ndo_change_mtu(), to prevent buffer overflow.
- eth:
- i40e: fix VF config validation
- broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl"
* tag 'net-6.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
octeontx2-pf: Fix potential use after free in otx2_tc_add_flow()
net: dsa: lantiq_gswip: suppress -EINVAL errors for bridge FDB entries added to the CPU port
net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()
libie: fix string names for AQ error codes
net/mlx5e: Fix missing FEC RS stats for RS_544_514_INTERLEAVED_QUAD
net/mlx5: HWS, ignore flow level for multi-dest table
net/mlx5: fs, fix UAF in flow counter release
selftests: fib_nexthops: Add test cases for FDB status change
selftests: fib_nexthops: Fix creation of non-FDB nexthops
nexthop: Forbid FDB status change while nexthop is in a group
net: allow alloc_skb_with_frags() to use MAX_SKB_FRAGS
bnxt_en: correct offset handling for IPv6 destination address
ptp: document behavior of PTP_STRICT_FLAGS
broadcom: fix support for PTP_EXTTS_REQUEST2 ioctl
broadcom: fix support for PTP_PEROUT_DUTY_CYCLE
Bluetooth: MGMT: Fix possible UAFs
Bluetooth: hci_event: Fix UAF in hci_acl_create_conn_sync
Bluetooth: hci_event: Fix UAF in hci_conn_tx_dequeue
Bluetooth: hci_sync: Fix hci_resume_advertising_sync
Bluetooth: Fix build after header cleanup
...
|
|
Update cros_ec_commands.h to include definitions for getting PWM fan
duty, getting and setting the fan control mode.
Signed-off-by: Sung-Chi Li <lschyi@chromium.org>
Acked-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20250911-cros_ec_fan-v6-1-a1446cc098af@google.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Pull virtio fixes from Michael Tsirkin:
"virtio,vhost: last minute fixes
More small fixes. Most notably this fixes crashes and hangs in
vhost-net"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
MAINTAINERS, mailmap: Update address for Peter Hilber
virtio_config: clarify output parameters
uapi: vduse: fix typo in comment
vhost: Take a reference on the task in struct vhost_task.
vhost-net: flush batched before enabling notifications
Revert "vhost/net: Defer TX queue re-enable until after sendmsg"
vhost-net: unbreak busy polling
vhost-scsi: fix argument order in tport allocation error message
|
|
Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
be mangled. Outer IDs can always be mangled.
Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
both inner and outer IDs to be mangled.
This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> # for sfc
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250923085908.4687-4-richardbgobert@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
There are some typos in the comments of migrate in
include/linux/preempt.h:
elegible -> eligible
it's -> its
migirate_disable -> migrate_disable
abritrary -> arbitrary
Just fix them.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
For now, migrate_enable and migrate_disable are global, which makes them
become hotspots in some case. Take BPF for example, the function calling
to migrate_enable and migrate_disable in BPF trampoline can introduce
significant overhead, and following is the 'perf top' of FENTRY's
benchmark (./tools/testing/selftests/bpf/bench trig-fentry):
54.63% bpf_prog_2dcccf652aac1793_bench_trigger_fentry [k]
bpf_prog_2dcccf652aac1793_bench_trigger_fentry
10.43% [kernel] [k] migrate_enable
10.07% bpf_trampoline_6442517037 [k] bpf_trampoline_6442517037
8.06% [kernel] [k] __bpf_prog_exit_recur
4.11% libc.so.6 [.] syscall
2.15% [kernel] [k] entry_SYSCALL_64
1.48% [kernel] [k] memchr_inv
1.32% [kernel] [k] fput
1.16% [kernel] [k] _copy_to_user
0.73% [kernel] [k] bpf_prog_test_run_raw_tp
So in this commit, we make migrate_enable/migrate_disable inline to obtain
better performance. The struct rq is defined internally in
kernel/sched/sched.h, and the field "nr_pinned" is accessed in
migrate_enable/migrate_disable, which makes it hard to make them inline.
Alexei Starovoitov suggests to generate the offset of "nr_pinned" in [1],
so we can define the migrate_enable/migrate_disable in
include/linux/sched.h and access "this_rq()->nr_pinned" with
"(void *)this_rq() + RQ_nr_pinned".
The offset of "nr_pinned" is generated in include/generated/rq-offsets.h
by kernel/sched/rq-offsets.c.
Generally speaking, we move the definition of migrate_enable and
migrate_disable to include/linux/sched.h from kernel/sched/core.c. The
calling to __set_cpus_allowed_ptr() is leaved in ___migrate_enable().
The "struct rq" is not available in include/linux/sched.h, so we can't
access the "runqueues" with this_cpu_ptr(), as the compilation will fail
in this_cpu_ptr() -> raw_cpu_ptr() -> __verify_pcpu_ptr():
typeof((ptr) + 0)
So we introduce the this_rq_raw() and access the runqueues with
arch_raw_cpu_ptr/PERCPU_PTR directly.
The variable "runqueues" is not visible in the kernel modules, and export
it is not a good idea. As Peter Zijlstra advised in [2], we define and
export migrate_enable/migrate_disable in kernel/sched/core.c too, and use
them for the modules.
Before this patch, the performance of BPF FENTRY is:
fentry : 113.030 ± 0.149M/s
fentry : 112.501 ± 0.187M/s
fentry : 112.828 ± 0.267M/s
fentry : 115.287 ± 0.241M/s
After this patch, the performance of BPF FENTRY increases to:
fentry : 143.644 ± 0.670M/s
fentry : 149.764 ± 0.362M/s
fentry : 149.642 ± 0.156M/s
fentry : 145.263 ± 0.221M/s
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/bpf/CAADnVQ+5sEDKHdsJY5ZsfGDO_1SEhhQWHrt2SMBG5SYyQ+jt7w@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/20250819123214.GH4067720@noisy.programming.kicks-ass.net/ [2]
|
|
In the next commit, we will move the definition of migrate_enable() and
migrate_disable() to linux/sched.h. However,
migrate_enable/migrate_disable will be used in commit
1b93c03fb319 ("rcu: add rcu_read_lock_dont_migrate()") in bpf-next tree.
In order to fix potential compiling error, replace linux/preempt.h with
linux/sched.h in include/linux/rcupdate.h.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
John reported undesirable behaviour with the dl_server since commit:
cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling").
When starving fair tasks on purpose (starting spinning FIFO tasks),
his fair workload, which often goes (briefly) idle, would delay fair
invocations for a second, running one invocation per second was both
unexpected and terribly slow.
The reason this happens is that when dl_se->server_pick_task() returns
NULL, indicating no runnable tasks, it would yield, pushing any later
jobs out a whole period (1 second).
Instead simply stop the server. This should restore behaviour in that
a later wakeup (which restarts the server) will be able to continue
running (subject to the CBS wakeup rules).
Notably, this does not re-introduce the behaviour cccb45d7c4295 set
out to solve, any start/stop cycle is naturally throttled by the timer
period (no active cancel).
Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>
|
|
John found it was easy to hit lockup warnings when running locktorture
on a 2 CPU VM, which he bisected down to: commit cccb45d7c429
("sched/deadline: Less agressive dl_server handling").
While debugging it seems there is a chance where we end up with the
dl_server dequeued, with dl_se->dl_server_active. This causes
dl_server_start() to return without enqueueing the dl_server, thus it
fails to run when RT tasks starve the cpu.
When this happens, dl_server_timer() catches the
'!dl_se->server_has_tasks(dl_se)' case, which then calls
replenish_dl_entity() and dl_server_stopped() and finally return
HRTIMER_NO_RESTART.
This ends in no new timer and also no enqueue, leaving the dl_server
'dead', allowing starvation.
What should have happened is for the bandwidth timer to start the
zero-laxity timer, which in turn would enqueue the dl_server and cause
dl_se->server_pick_task() to be called -- which will stop the
dl_server if no fair tasks are observed for a whole period.
IOW, it is totally irrelevant if there are fair tasks at the moment of
bandwidth refresh.
This removes all dl_se->server_has_tasks() users, so remove the whole
thing.
Fixes: cccb45d7c4295 ("sched/deadline: Less agressive dl_server handling")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: John Stultz <jstultz@google.com>
|
|
It's misplaced in struct proc_ns_operations and ns->ops might be NULL if
the namespace is compiled out but we still want to know the type of the
namespace for the initial namespace struct.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Don't expose it directly. There's no need to do that.
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new passthrough type HL_GET_P_STATE to the cpucp generic ioctl
to allow userspace to read the device performance state via firmware.
Signed-off-by: Ariel Aviad <ariel.aviad@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
Add a new CPUCP generic message type to retrieve HBM, SRAM and critical
error counters from the device.
Signed-off-by: Vitaly Margolin <vitaly.margolin@intel.com>
Reviewed-by: Koby Elbaz <koby.elbaz@intel.com>
Signed-off-by: Koby Elbaz <koby.elbaz@intel.com>
|
|
The commit c2c60ea37e5b ("once: use __section(".data.once")") moved
DO_ONCE's ___done variable to .data.once section, which conflicts with
DO_ONCE_LITE() that also uses the same section.
This creates a race condition when clear_warn_once is used:
Thread 1 (DO_ONCE) Thread 2 (DO_ONCE)
__do_once_start
read ___done (false)
acquire once_lock
execute func
__do_once_done
write ___done (true) __do_once_start
release once_lock // Thread 3 clear_warn_once reset ___done
read ___done (false)
acquire once_lock
execute func
schedule once_work __do_once_done
once_deferred: OK write ___done (true)
static_branch_disable release once_lock
schedule once_work
once_deferred:
BUG_ON(!static_key_enabled)
DO_ONCE_LITE() in once_lite.h is used by WARN_ON_ONCE() and other warning
macros. Keep its ___done flag in the .data..once section and allow resetting
by clear_warn_once, as originally intended.
In contrast, DO_ONCE() is used for functions like get_random_once() and
relies on its ___done flag for internal synchronization. We should not reset
DO_ONCE() by clear_warn_once.
Fix it by isolating DO_ONCE's ___done into a separate .data..do_once section,
shielding it from clear_warn_once.
Fixes: c2c60ea37e5b ("once: use __section(".data.once")")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Qi Xi <xiqi2@huawei.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Add common memcpy APIs for copying u32 array to/from __le32 array.
Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Link: https://lore.kernel.org/r/20250818040920.272664-7-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Introduce optional fw_node() callback which allows a mailbox controller
driver to provide controller specific mapping using fwnode.
The Linux OF framework already implements fwnode operations for the
Linux DD framework so the fw_xlate() callback works fine with device
tree as well.
Acked-by: Jassi Brar <jassisinghbrar@gmail.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Link: https://lore.kernel.org/r/20250818040920.272664-6-apatel@ventanamicro.com
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|
|
Currently, HFS/HFS+ has very obsolete and inconvenient
debug output subsystem. Also, the code is duplicated
in HFS and HFS+ driver. This patch introduces
linux/hfs_common.h for gathering common declarations,
inline functions, and common short methods. Currently,
this file contains only hfs_dbg() function that
employs pr_debug() with the goal to print a debug-level
messages conditionally.
So, now, it is possible to enable the debug output
by means of:
echo 'file extent.c +p' > /proc/dynamic_debug/control
echo 'func hfsplus_evict_inode +p' > /proc/dynamic_debug/control
And debug output looks like this:
hfs: pid 5831:fs/hfs/catalog.c:228 hfs_cat_delete(): delete_cat: 00,48
hfs: pid 5831:fs/hfs/extent.c:484 hfs_file_truncate(): truncate: 48, 409600 -> 0
hfs: pid 5831:fs/hfs/extent.c:212 hfs_dump_extent():
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 78:4
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 0:0
hfs: pid 5831:fs/hfs/extent.c:214 hfs_dump_extent(): 0:0
v4
Debug messages have been reworked and information about
new HFS/HFS+ shared declarations file has been added
to MAINTAINERS file.
v5
Yangtao Li suggested to clean up debug output and
fix several typos.
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
cc: Yangtao Li <frank.li@vivo.com>
cc: linux-fsdevel@vger.kernel.org
cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
|