Age | Commit message (Collapse) | Author |
|
Currently, zsmalloc, zswap's and zram's backend memory allocator, does not
enforce any policy for the allocation of memory for the compressed data,
instead just adopting the memory policy of the task entering reclaim, or
the default policy (prefer local node) if no such policy is specified.
This can lead to several pathological behaviors in multi-node NUMA
systems:
1. Systems with CXL-based memory tiering can encounter the following
inversion with zswap/zram: the coldest pages demoted to the CXL tier
can return to the high tier when they are reclaimed to compressed swap,
creating memory pressure on the high tier.
2. Consider a direct reclaimer scanning nodes in order of allocation
preference. If it ventures into remote nodes, the memory it compresses
there should stay there. Trying to shift those contents over to the
reclaiming thread's preferred node further *increases* its local
pressure, and provoking more spills. The remote node is also the most
likely to refault this data again. This undesirable behavior was
pointed out by Johannes Weiner in [1].
3. For zswap writeback, the zswap entries are organized in
node-specific LRUs, based on the node placement of the original pages,
allowing for targeted zswap writeback for specific nodes.
However, the compressed data of a zswap entry can be placed on a
different node from the LRU it is placed on. This means that reclaim
targeted at one node might not free up memory used for zswap entries in
that node, but instead reclaiming memory in a different node.
All of these issues will be resolved if the compressed data go to the same
node as the original page. This patch encourages this behavior by having
zswap and zram pass the node of the original page to zsmalloc, and have
zsmalloc prefer the specified node if we need to allocate new (zs)pages
for the compressed data.
Note that we are not strictly binding the allocation to the preferred
node. We still allow the allocation to fall back to other nodes when the
preferred node is full, or if we have zspages with slots available on a
different node. This is OK, and still a strict improvement over the
status quo:
1. On a system with demotion enabled, we will generally prefer
demotions over compressed swapping, and only swap when pages have
already gone to the lowest tier. This patch should achieve the desired
effect for the most part.
2. If the preferred node is out of memory, letting the compressed data
going to other nodes can be better than the alternative (OOMs, keeping
cold memory unreclaimed, disk swapping, etc.).
3. If the allocation go to a separate node because we have a zspage
with slots available, at least we're not creating extra immediate
memory pressure (since the space is already allocated).
3. While there can be mixings, we generally reclaim pages in same-node
batches, which encourage zspage grouping that is more likely to go to
the right node.
4. A strict binding would require partitioning zsmalloc by node, which
is more complicated, and more prone to regression, since it reduces the
storage density of zsmalloc. We need to evaluate the tradeoff and
benchmark carefully before adopting such an involved solution.
[1]: https://lore.kernel.org/linux-mm/20250331165306.GC2110528@cmpxchg.org/
[senozhatsky@chromium.org: coding-style fixes]
Link: https://lkml.kernel.org/r/mnvexa7kseswglcqbhlot4zg3b3la2ypv2rimdl5mh5glbmhvz@wi6bgqn47hge
Link: https://lkml.kernel.org/r/20250402204416.3435994-1-nphamcs@gmail.com
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
Suggested-by: Gregory Price <gourry@gourry.net>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Sergey Senozhatsky <senozhatsky@chromium.org> [zram, zsmalloc]
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Yosry Ahmed <yosry.ahmed@linux.dev> [zswap/zsmalloc]
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All callers now use folio_nr_pages(). Delete this wrapper.
Link: https://lkml.kernel.org/r/20250402210612.2444135-9-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This function has no more callers; delete it.
Link: https://lkml.kernel.org/r/20250402210612.2444135-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extract folios from i_mapping, not pages. Removes a hidden call to
compound_head(), a use of thp_nr_pages() and an unnecessary assertion that
we didn't find a tail page in the page cache.
Link: https://lkml.kernel.org/r/20250402210612.2444135-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All users of this function now call folio_file_page() instead. Delete it.
Link: https://lkml.kernel.org/r/20250402210612.2444135-6-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All callers have been converted to call offset_in_folio().
Link: https://lkml.kernel.org/r/20250402210612.2444135-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Misc folio patches for 6.16".
Remove a few APIs that we've converted everybody from using. I also found
a few places that extract a page pointer from i_pages, which will be an
invalid thing to do when we separate pages from folios.
This patch (of 8):
All filesystems have now been converted to call readahead_folio() so we
can delete this wrapper.
Link: https://lkml.kernel.org/r/20250402210612.2444135-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20250402210612.2444135-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There are now no callers of mk_huge_pmd() and mk_pmd(). Remove them.
Link: https://lkml.kernel.org/r/20250402181709.2386022-12-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Removes five conversions from folio to page. Also removes both callers of
mk_pmd() that aren't part of mk_huge_pmd(), getting us a step closer to
removing the confusion between mk_pmd(), mk_huge_pmd() and pmd_mkhuge().
Link: https://lkml.kernel.org/r/20250402181709.2386022-11-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove a cast from folio to page in four callers of mk_pte().
Link: https://lkml.kernel.org/r/20250402181709.2386022-8-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All architectures now use the common mk_pte() definition, so we can remove
the condition.
Link: https://lkml.kernel.org/r/20250402181709.2386022-7-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Most architectures simply call pfn_pte(). Centralise that as the normal
definition and remove the definition of mk_pte() from the architectures
which have either that exact definition or something similar.
Link: https://lkml.kernel.org/r/20250402181709.2386022-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> # s390
Cc: Zi Yan <ziy@nvidia.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Richard Weinberger <richard@nod.at>
Cc: <x86@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
checks") introduces a possible use-after-free scenario, when page
is non-compound, page[0] could be released by other thread right
after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
afterwards would manipulate an invalid page for accounting remaining
pages:
[timeline] [thread1] [thread2]
| alloc_page non-compound
V
| get_page, rf counter inc
V
| in ___free_pages
| put_page_testzero fails
V
| put_page, page released
V
| in ___free_pages,
| pgalloc_tag_sub_pages
| manipulate an invalid page
V
Restore __free_pages() to its state before, retrieve alloc tag
beforehand.
Link: https://lkml.kernel.org/r/20250505193034.91682-1-00107082@163.com
Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")
Signed-off-by: David Wang <00107082@163.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 ITS mitigation from Dave Hansen:
"Mitigate Indirect Target Selection (ITS) issue.
I'd describe this one as a good old CPU bug where the behavior is
_obviously_ wrong, but since it just results in bad predictions it
wasn't wrong enough to notice. Well, the researchers noticed and also
realized that thus bug undermined a bunch of existing indirect branch
mitigations.
Thus the unusually wide impact on this one. Details:
ITS is a bug in some Intel CPUs that affects indirect branches
including RETs in the first half of a cacheline. Due to ITS such
branches may get wrongly predicted to a target of (direct or indirect)
branch that is located in the second half of a cacheline. Researchers
at VUSec found this behavior and reported to Intel.
Affected processors:
- Cascade Lake, Cooper Lake, Whiskey Lake V, Coffee Lake R, Comet
Lake, Ice Lake, Tiger Lake and Rocket Lake.
Scope of impact:
- Guest/host isolation:
When eIBRS is used for guest/host isolation, the indirect branches
in the VMM may still be predicted with targets corresponding to
direct branches in the guest.
- Intra-mode using cBPF:
cBPF can be used to poison the branch history to exploit ITS.
Realigning the indirect branches and RETs mitigates this attack
vector.
- User/kernel:
With eIBRS enabled user/kernel isolation is *not* impacted by ITS.
- Indirect Branch Prediction Barrier (IBPB):
Due to this bug indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB.
This will be fixed in the microcode.
Mitigation:
As indirect branches in the first half of cacheline are affected, the
mitigation is to replace those indirect branches with a call to thunk that
is aligned to the second half of the cacheline.
RETs that take prediction from RSB are not affected, but they may be
affected by RSB-underflow condition. So, RETs in the first half of
cacheline are also patched to a return thunk that executes the RET aligned
to second half of cacheline"
* tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest/x86/bugs: Add selftests for ITS
x86/its: FineIBT-paranoid vs ITS
x86/its: Use dynamic thunks for indirect branches
x86/ibt: Keep IBT disabled during alternative patching
mm/execmem: Unify early execmem_cache behaviour
x86/its: Align RETs in BHB clear sequence to avoid thunking
x86/its: Add support for RSB stuffing mitigation
x86/its: Add "vmexit" option to skip mitigation on some CPUs
x86/its: Enable Indirect Target Selection mitigation
x86/its: Add support for ITS-safe return thunk
x86/its: Add support for ITS-safe indirect thunk
x86/its: Enumerate Indirect Target Selection (ITS) bug
Documentation: x86/bugs/its: Add ITS documentation
|
|
I've been looking at a problem where we see increased RPC timeouts in
clients when the nfs_layout_flexfiles dataserver_timeo value is tuned
very low (6s). This is necessary to ensure quick failover to a different
mirror if a server goes down, but it causes a lot more major RPC timeouts.
Ultimately, the problem is server-side however. It's sometimes doesn't
respond to connection attempts. My theory is that the interrupt handler
runs when a connection comes in, the xprt ends up being enqueued, but it
takes a significant amount of time for the nfsd thread to pick it up.
Currently, the svc_xprt_dequeue tracepoint displays "wakeup-us". This is
the time between the wake_up() call, and the thread dequeueing the xprt.
If no thread was woken, or the thread ended up picking up a different
xprt than intended, then this value won't tell us how long the xprt was
waiting.
Add a new xpt_qtime field to struct svc_xprt and set it in
svc_xprt_enqueue(). When the dequeue tracepoint fires, also store the
time that the xprt sat on the queue in total. Display it as "qtime-us".
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Resolve conflicts in dell/alienware-wmi-wmax and asus-wmi, and enable
applying a few amd/hsmp patches that depend on changes in the fixes
branch.
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.16-2025-05-09:
amdgpu:
- IPS fixes
- DSC cleanup
- DC Scaling updates
- DC FP fixes
- Fused I2C-over-AUX updates
- SubVP fixes
- Freesync fix
- DMUB AUX fixes
- VCN fix
- Hibernation fixes
- HDP fixes
- DCN 2.1 fixes
- DPIA fixes
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- Misc code cleanups
- SR-IOV updates
- RAS updates
- PSP 12 cleanups
amdkfd:
- Update error messages for SDMA
- Userptr updates
drm:
- Add drm_file_err function
dma-buf:
- Add a helper to sort and deduplicate dma_fence arrays
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc timers fixes from Ingo Molnar:
- Fix time keeping bugs in CLOCK_MONOTONIC_COARSE clocks
- Work around absolute relocations into vDSO code that GCC erroneously
emits in certain arm64 build environments
- Fix a false positive lockdep warning in the i8253 clocksource driver
* tag 'timers-urgent-2025-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
arm64: vdso: Work around invalid absolute relocations from GCC
timekeeping: Prevent coarse clocks going backwards
|
|
The acpi_handle should be just a handle and not a pointer in
sdw_intel_acpi_scan() parameter list.
It is called with 'acpi_handle handle' as parameter and it is passing it to
acpi_walk_namespace, which also expects acpi_handle and not acpi_handle*
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://patch.msgid.link/20250508181207.22113-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
futex_mm_init()
The following commit added an rcu_assign_pointer() assignment to
futex_mm_init() in <linux/futex.h>:
bd54df5ea7ca ("futex: Allow to resize the private local hash")
Which breaks the build on older compilers (gcc-9, x86-64 defconfig):
CC io_uring/futex.o
In file included from ./arch/x86/include/generated/asm/rwonce.h:1,
from ./include/linux/compiler.h:390,
from ./include/linux/array_size.h:5,
from ./include/linux/kernel.h:16,
from io_uring/futex.c:2:
./include/linux/futex.h: In function 'futex_mm_init':
./include/linux/rcupdate.h:555:36: error: dereferencing pointer to incomplete type 'struct futex_private_hash'
The problem is that this variant of rcu_assign_pointer() wants to
know the full type of 'struct futex_private_hash', which type
is local to futex.c:
kernel/futex/core.c:struct futex_private_hash {
There are a couple of mechanical solutions for this bug:
- we can uninline futex_mm_init() and move it into futex/core.c
- or we can share the structure definition with kernel/fork.c.
But both of these solutions have disadvantages: the first one adds
runtime overhead, while the second one dis-encapsulates private
futex types.
A third solution, implemented by this patch, is to just initialize
mm->futex_phash with NULL like the patch below, it's not like this
new MM's ->futex_phash can be observed externally until the task
is inserted into the task list, which guarantees full store ordering.
The relaxation of this initialization might also give a tiny speedup
on certain platforms.
Fixes: bd54df5ea7ca ("futex: Allow to resize the private local hash")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/aB8SI00EHBri23lB@gmail.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes. 13 are cc:stable and the remainder address post-6.14
issues or aren't considered necessary for -stable kernels.
About half are for MM. Five OCFS2 fixes and a few MAINTAINERS updates"
* tag 'mm-hotfixes-stable-2025-05-10-14-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
mm: fix folio_pte_batch() on XEN PV
nilfs2: fix deadlock warnings caused by lock dependency in init_nilfs()
mm/hugetlb: copy the CMA flag when demoting
mm, swap: fix false warning for large allocation with !THP_SWAP
selftests/mm: fix a build failure on powerpc
selftests/mm: fix build break when compiling pkey_util.c
mm: vmalloc: support more granular vrealloc() sizing
tools/testing/selftests: fix guard region test tmpfs assumption
ocfs2: stop quota recovery before disabling quotas
ocfs2: implement handshaking with ocfs2 recovery thread
ocfs2: switch osb->disable_recovery to enum
mailmap: map Uwe's BayLibre addresses to a single one
MAINTAINERS: add mm THP section
mm/userfaultfd: fix uninitialized output field for -EAGAIN race
selftests/mm: compaction_test: support platform with huge mount of memory
MAINTAINERS: add core mm section
ocfs2: fix panic in failed foilio allocation
mm/huge_memory: fix dereferencing invalid pmd migration entry
MAINTAINERS: add reverse mapping section
x86: disable image size check for test builds
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver fixes from Greg KH:
"Here are a bunch of small driver fixes (mostly all IIO) for 6.15-rc6.
Included in here are:
- loads of tiny IIO driver fixes for reported issues
- hyperv driver fix for a much-reported and worked on sysfs ring
buffer creation bug
All of these have been in linux-next for over a week (the IIO ones for
many weeks now), with no reported issues"
* tag 'char-misc-6.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (30 commits)
Drivers: hv: Make the sysfs node size for the ring buffer dynamic
uio_hv_generic: Fix sysfs creation path for ring buffer
iio: adis16201: Correct inclinometer channel resolution
iio: adc: ad7606: fix serial register access
iio: pressure: mprls0025pa: use aligned_s64 for timestamp
iio: imu: adis16550: align buffers for timestamp
staging: iio: adc: ad7816: Correct conditional logic for store mode
iio: adc: ad7266: Fix potential timestamp alignment issue.
iio: adc: ad7768-1: Fix insufficient alignment of timestamp.
iio: adc: dln2: Use aligned_s64 for timestamp
iio: accel: adxl355: Make timestamp 64-bit aligned using aligned_s64
iio: temp: maxim-thermocouple: Fix potential lack of DMA safe buffer.
iio: chemical: pms7003: use aligned_s64 for timestamp
iio: chemical: sps30: use aligned_s64 for timestamp
iio: imu: inv_mpu6050: align buffer for timestamp
iio: imu: st_lsm6dsx: Fix wakeup source leaks on device unbind
iio: adc: qcom-spmi-iadc: Fix wakeup source leaks on device unbind
iio: accel: fxls8962af: Fix wakeup source leaks on device unbind
iio: adc: ad7380: fix event threshold shift
iio: hid-sensor-prox: Fix incorrect OFFSET calculation
...
|
|
It's no longer used and can be removed, also remove the field
'gendisk->sync_io'.
Link: https://lore.kernel.org/linux-raid/20250506124903.2540268-10-yukuai1@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
|
|
- rename part_in_{flight, flight_rw} to bdev_count_{inflight, inflight_rw}
- export bdev_count_inflight, to fix a problem in mdraid that foreground
IO can be starved by background sync IO in later patches
Link: https://lore.kernel.org/linux-raid/20250506124903.2540268-6-yukuai1@huaweicloud.com
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into soc/drivers
Memory controller drivers for v6.16
1. Mediatek: Add support for MT6893 MTK SMI.
2. STM32: Add new driver for STM32 Octo Memory Manager (OMM), which
manages muxing between two OSPI busses.
3. Several cleanups and minor improvements (OMAP GPMC, Kconfig entries,
BT1 L2).
* tag 'memory-controller-drv-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl:
MAINTAINERS: add entry for STM32 OCTO MEMORY MANAGER driver
memory: Add STM32 Octo Memory Manager driver
dt-bindings: memory-controllers: Add STM32 Octo Memory Manager controller
bus: firewall: Fix missing static inline annotations for stubs
memory: bt1-l2-ctl: replace scnprintf() with sysfs_emit()
memory: mtk-smi: Add support for Dimensity 1200 MT6893 SMI
dt-bindings: memory: mtk-smi: Add support for MT6893
memory: tegra: Do not enable by default during compile testing
memory: Simplify 'default' choice in Kconfig
memory: omap-gpmc: remove GPIO set() and direction_output() callbacks
memory: omap-gpmc: use the dedicated define for GPIO direction
Link: https://lore.kernel.org/r/20250508093451.55755-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers
Arm SCMI updates for v6.16
1. Quirk framework to handle buggy firmware
With SCMI gaining broader adoption across arm64 platforms, it's
increasingly important to address how we consistently manage out-of-spec
SCMI firmware already deployed in the field. This change introduces a
lightweight quirk framework built around static_keys, enabling developers to:
- Define quirks and their match criteria, which can include:
o A list of compatibles ({ comp, comp2, NULL })
o Vendor ID / Sub-Vendor ID
o Firmware implementation version ranges ([Min_Vers, Max_Vers])
Matching proceeds from the most specific (longest match) to the least
specific. NULL entries are treated as wildcards (i.e., match any value).
This flexibility allows matching very specific combinations or just a
general compatible string.
The quirk code blocks/snippets implementing the workaround are placed near
their intended usage and guarded by a static_key that's tied to the quirk.
Once the SCMI core stack is initialized and retrieves platform info via the
base protocol, any matching quirks will have their associated static_keys
enabled.
2. Quirk for Qualcomm X1E platforms
On some Qualcomm X1E platforms, such as the Lenovo ThinkPad T14s, the
SCMI firmware fails to set the FastChannel support bit for PERF_LEVEL_GET,
yet it crashes when the driver attempts to fall back to standard messaging
which is clearly out-of-spec behavior.
To work around this, the new SCMI quirk framework is used to
unconditionally enable FC initialization for this firmware version.
In the future, once the fixed firmware version is identified, an upper
version bound can be added to the quirk match criteria. Alternatively,
matching can be further restricted using a SoC-specific compatible string
if always enabling FC proves problematic elsewhere.
3. Support for NXP i.MX LMM/CPU vendor protocol extensions
The i.MX95 System Manager (SM) implements Logical Machine Management (LMM)
and a CPU protocol to manage Logical Machines (LM) and CPUs (e.g., M7).
These changes integrate the vendor-specific protocol extensions
implementing the LMM and CPU protocols for the i.MX95, facilitating
standardized communication between the operating system and the platform's
firmware, which will be used by remoteproc drivers. The changes also
include the necessary device tree bindings.
4. Miscellaneous cleanups/changes
These mainly include polling support in SCMI raw mode. The cleanups
centralize error logging for SCMI device creation into a single helper
function, consolidate the device matching logic into a single function, and
ensure that devices must have a name for registration—removing support for
unnamed devices when matching drivers and devices for probing. Transport
devices are now excluded from bus matching, and the correct assignment of
the parent device for the arm-scmi platform device is ensured in the
transport drivers.
* tag 'scmi-updates-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
firmware: arm_scmi: quirk: Force perf level get fastchannel
firmware: arm_scmi: quirk: Fix CLOCK_DESCRIBE_RATES triplet
firmware: arm_scmi: Add common framework to handle firmware quirks
firmware: arm_scmi: Ensure that the message-id supports fastchannel
MAINTAINERS: add entry for i.MX SCMI extensions
firmware: imx: Add i.MX95 SCMI CPU driver
firmware: imx: Add i.MX95 SCMI LMM driver
firmware: arm_scmi: imx: Add i.MX95 CPU Protocol
firmware: arm_scmi: imx: Add i.MX95 LMM protocol
dt-bindings: firmware: Add i.MX95 SCMI LMM and CPU protocol
firmware: arm_scmi: imx: Add LMM and CPU documentation
firmware: arm_scmi: Add polling support to raw mode
firmware: arm_scmi: Exclude transport devices from bus matching
firmware: arm_scmi: Assign correct parent to arm-scmi platform device
firmware: arm_scmi: Refactor error logging from SCMI device creation to single helper
firmware: arm_scmi: Refactor device matching logic to eliminate duplication
firmware: arm_scmi: Ensure scmi_devices are always matched by name as well
Link: https://lore.kernel.org/r/20250507134713.49039-1-sudeep.holla@arm.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/drivers
Samsung SoC drivers for v6.16
Several improvements to Exynos ACPM (Alive Clock and Power Manager)
driver:
1. Handle communication timeous better.
2. Avoid sleeping, so users (PMIC) can still transfer during system
shutdown.
3. Fix reading longer messages from them firmware.
4. Deferred probe improvements.
5. Model the user of ACPM - PMIC - a as child device and export
devm_acpm_get_by_node() for such use case.
* tag 'samsung-drivers-6.16' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
firmware: exynos-acpm: Correct kerneldoc and use typical np argument name
firmware: exynos-acpm: introduce devm_acpm_get_by_node()
firmware: exynos-acpm: populate devices from device tree data
firmware: exynos-acpm: silence EPROBE_DEFER error on boot
firmware: exynos-acpm: fix reading longer results
dt-bindings: firmware: google,gs101-acpm-ipc: add PMIC child node
firmware: exynos-acpm: allow use during system shutdown
firmware: exynos-acpm: use ktime APIs for timeout detection
Link: https://lore.kernel.org/r/20250501103541.13795-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
ITS mitigation moves the unsafe indirect branches to a safe thunk. This
could degrade the prediction accuracy as the source address of indirect
branches becomes same for different execution paths.
To improve the predictions, and hence the performance, assign a separate
thunk for each indirect callsite. This is also a defense-in-depth measure
to avoid indirect branches aliasing with each other.
As an example, 5000 dynamic thunks would utilize around 16 bits of the
address space, thereby gaining entropy. For a BTB that uses
32 bits for indexing, dynamic thunks could provide better prediction
accuracy over fixed thunks.
Have ITS thunks be variable sized and use EXECMEM_MODULE_TEXT such that
they are both more flexible (got to extend them later) and live in 2M TLBs,
just like kernel code, avoiding undue TLB pressure.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
|
|
Early kernel memory is RWX, only at the end of early boot (before SMP)
do we mark things ROX. Have execmem_cache mirror this behaviour for
early users.
This avoids having to remember what code is execmem and what is not --
we can poke everything with impunity ;-) Also performance for not
having to do endless text_poke_mm switches.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
|
|
Indirect Target Selection (ITS) is a bug in some pre-ADL Intel CPUs with
eIBRS. It affects prediction of indirect branch and RETs in the
lower half of cacheline. Due to ITS such branches may get wrongly predicted
to a target of (direct or indirect) branch that is located in the upper
half of the cacheline.
Scope of impact
===============
Guest/host isolation
--------------------
When eIBRS is used for guest/host isolation, the indirect branches in the
VMM may still be predicted with targets corresponding to branches in the
guest.
Intra-mode
----------
cBPF or other native gadgets can be used for intra-mode training and
disclosure using ITS.
User/kernel isolation
---------------------
When eIBRS is enabled user/kernel isolation is not impacted.
Indirect Branch Prediction Barrier (IBPB)
-----------------------------------------
After an IBPB, indirect branches may be predicted with targets
corresponding to direct branches which were executed prior to IBPB. This is
mitigated by a microcode update.
Add cmdline parameter indirect_target_selection=off|on|force to control the
mitigation to relocate the affected branches to an ITS-safe thunk i.e.
located in the upper half of cacheline. Also add the sysfs reporting.
When retpoline mitigation is deployed, ITS safe-thunks are not needed,
because retpoline sequence is already ITS-safe. Similarly, when call depth
tracking (CDT) mitigation is deployed (retbleed=stuff), ITS safe return
thunk is not used, as CDT prevents RSB-underflow.
To not overcomplicate things, ITS mitigation is not supported with
spectre-v2 lfence;jmp mitigation. Moreover, it is less practical to deploy
lfence;jmp mitigation on ITS affected parts anyways.
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
|
|
Add the function ring_buffer_record_is_on_cpu() that returns true if the
ring buffer for a give CPU is writable and false otherwise.
Also add tracer_tracing_is_on_cpu() to return if the ring buffer for a
given CPU is writeable for a given trace_array.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250505212236.059853898@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In preparation of supporting more than a single core PCI driver
for RDMA, move ice specific structs like qset_params, qos_info
and qos_params from iidc_rdma.h to iidc_rdma_ice.h.
Previously, the ice driver was just exporting its entire PF struct
to the auxiliary driver, but since each core driver will have its own
different PF struct, implement a universal struct that all core drivers
can provide to the auxiliary driver through the probe call.
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Co-developed-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Co-developed-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Co-developed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
if it works under NMI and doesn't use any context-dependent things,
should be fine for any program type. The detailed discussion is in [1].
[1] https://lore.kernel.org/all/CAEf4Bza6gK3dsrTosk6k3oZgtHesNDSrDd8sdeQ-GiS6oJixQg@mail.gmail.com/
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/bpf/20250506061434.94277-2-yangfeng59949@163.com
|
|
The Maxim MAX77759 is a companion PMIC for USB Type-C applications and
includes Battery Charger, Fuel Gauge, temperature sensors, USB Type-C
Port Controller (TCPC), NVMEM, and a GPIO expander.
Fuel Gauge and TCPC have separate and independent I2C addresses,
register maps, and interrupt lines and are therefore excluded from the
MFD core device driver here.
The GPIO and NVMEM interfaces are accessed via specific commands to the
built-in microprocessor. This driver implements an API that client
drivers can use for accessing those.
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Acked-by: Peter Griffin <peter.griffin@linaro.org>
Link: https://lore.kernel.org/r/20250509-max77759-mfd-v10-1-962ac15ee3ef@linaro.org
Signed-off-by: Lee Jones <lee@kernel.org>
|
|
Keep track of the number requests a ring currently has allocated (and
not freed), it'll be needed in the next patch.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c8f8308294dc2a1cb8925d984d937d4fc14ab5d4.1746788718.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit f7ee1f13d606 ("mm/vmalloc: enable mapping of huge pages at pte
level in vmap") added its support by reusing the set_huge_pte_at() API,
which is otherwise only used for user mappings. But when unmapping those
huge ptes, it continued to call ptep_get_and_clear(), which is a
layering violation. To date, the only arch to implement this support is
powerpc and it all happens to work ok for it.
But arm64's implementation of ptep_get_and_clear() can not be safely
used to clear a previous set_huge_pte_at(). So let's introduce a new
arch opt-in function, arch_vmap_pte_range_unmap_size(), which can
provide the size of a (present) pte. Then we can call
huge_ptep_get_and_clear() to tear it down properly.
Note that if vunmap_range() is called with a range that starts in the
middle of a huge pte-mapped page, we must unmap the entire huge page so
the behaviour is consistent with pmd and pud block mappings. In this
case emit a warning just like we do for pmd/pud mappings.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-9-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Convert page_table_check_p[mu]d_set(...) to
page_table_check_p[mu]ds_set(..., nr) to allow checking a contiguous set
of pmds/puds in single batch. We retain page_table_check_p[mu]d_set(...)
as macros that call new batch functions with nr=1 for compatibility.
arm64 is about to reorganise its pte/pmd/pud helpers to reuse more code
and to allow the implementation for huge_pte to more efficiently set
ptes/pmds/puds in batches. We need these batch-helpers to make the
refactoring possible.
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Luiz Capitulino <luizcap@redhat.com>
Link: https://lore.kernel.org/r/20250422081822.1836315-4-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops
and therefore it is not necessary to compile in the code for changing
the reference count.
When CONFIG_CGROUP is selected, there is no valid case where
either of {get,put}_cgroup_ns() will be called with a NULL argument.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.
For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.
If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.
There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.
If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.
If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.
So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Some NETC functionality is controlled using control messages sent to the
hardware using BD ring interface with 32B descriptor similar to transmit
BD ring used on ENETC. This BD ring interface is referred to as command
BD ring. It is used to configure functionality where the underlying
resources may be shared between different entities or being too large to
configure using direct registers. Therefore, a messaging protocol called
NETC Table Management Protocol (NTMP) is provided for exchanging
configuration and management information between the software and the
hardware using the command BD ring interface.
For the management protocol of LS1028A has been retroactively named NTMP
1.0, and its implementation is in enetc_cbdr.c and enetc_qos.c. However,
NTMP of i.MX95 has been upgraded to version 2.0, which is incompatible
with LS1028A, because the message formats have been changed. Therefore,
add the netc-lib driver to support NTMP 2.0 to operate various tables.
Note that, only MAC address filter table and RSS table are supported at
the moment. More tables will be supported in subsequent patches.
It is worth mentioning that the purpose of the netc-lib driver is to
provide some NTMP-based generic interfaces for ENETC and NETC Switch
drivers. Currently, it only supports the configurations of some tables.
Interfaces such as tc flower and debugfs will be added in the future.
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250506080735.3444381-2-wei.fang@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Introduce new TSM Measurement helper library (tsm-mr) for TVM guest drivers
to expose MRs (Measurement Registers) as sysfs attributes, with Crypto
Agility support.
Add the following new APIs (see include/linux/tsm-mr.h for details):
- tsm_mr_create_attribute_group(): Take on input a `struct
tsm_measurements` instance, which includes one `struct
tsm_measurement_register` per MR with properties like `TSM_MR_F_READABLE`
and `TSM_MR_F_WRITABLE`, to determine the supported operations and create
the sysfs attributes accordingly. On success, return a `struct
attribute_group` instance that will typically be included by the guest
driver into `miscdevice.groups` before calling misc_register().
- tsm_mr_free_attribute_group(): Free the memory allocated to the attrubute
group returned by tsm_mr_create_attribute_group().
tsm_mr_create_attribute_group() creates one attribute for each MR, with
names following this pattern:
MRNAME[:HASH]
- MRNAME - Placeholder for the MR name, as specified by
`tsm_measurement_register.mr_name`.
- :HASH - Optional suffix indicating the hash algorithm associated with
this MR, as specified by `tsm_measurement_register.mr_hash`.
Support Crypto Agility by allowing multiple definitions of the same MR
(i.e., with the same `mr_name`) with distinct HASH algorithms.
NOTE: Crypto Agility, introduced in TPM 2.0, allows new hash algorithms to
be introduced without breaking compatibility with applications using older
algorithms. CC architectures may face the same challenge in the future,
needing new hashes for security while retaining compatibility with older
hashes, hence the need for Crypto Agility.
Signed-off-by: Cedric Xing <cedric.xing@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>
[djbw: fixup bin_attr const conflict]
Link: https://patch.msgid.link/20250509020739.882913-1-dan.j.williams@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Retain the locked design, but check rate-limiting even when the lock
could not be acquired.
Link: https://lore.kernel.org/all/Z_VRo63o2UsVoxLG@pathway.suse.cz/
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
The ___ratelimit() function special-cases the jiffies-counter value of zero
as "uninitialized". This works well on 64-bit systems, where the jiffies
counter is not going to return to zero for more than half a billion years
on systems with HZ=1000, but similar 32-bit systems take less than 50 days
to wrap the jiffies counter. And although the consequences of wrapping the
jiffies counter seem to be limited to minor confusion on the duration of
the rate-limiting interval that happens to end at time zero, it is almost
no work to avoid this confusion.
Therefore, introduce a RATELIMIT_INITIALIZED bit to the ratelimit_state
structure's ->flags field so that a ->begin value of zero is no longer
special.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
The ratelimit_state structure's ->missed field is sometimes incremented
locklessly, and it would be good to avoid lost counts. This is also
needed to count the number of misses due to trylock failure. Therefore,
convert the ratelimit_state structure's ->missed field to atomic_t.
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
A number of ratelimit use cases do open-coded access to the
ratelimit_state structure's ->missed field. This works, but is a bit
messy and makes it more annoying to make changes to this field.
Therefore, provide a ratelimit_state_inc_miss() function that increments
the ->missed field, a ratelimit_state_get_miss() function that reads
out the ->missed field, and a ratelimit_state_reset_miss() function
that reads out that field, but that also resets its value to zero.
These functions will replace client-code open-coded uses of ->missed.
In addition, a new ratelimit_state_reset_interval() function encapsulates
what was previously open-coded lock acquisition and direct field updates.
[ paulmck: Apply kernel test robot feedback. ]
Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/
Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
|
|
As per IEEE 802.11be-2024 - 9.4.2.321, EHT operation element
contains MCS15 Disable subfield as the sixth bit, which is set when
MCS15 support is not enabled.
Get MCS15 support from EHT operation params and add it in link_conf
so that driver can use this value to know if EHT-MCS 15 reception
is enabled.
Co-developed-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com>
Signed-off-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com>
Signed-off-by: Mohan Kumar G <quic_mkumarg@quicinc.com>
Link: https://patch.msgid.link/20250505152836.3266829-1-quic_mkumarg@quicinc.com
[remove pointless !! for bool assignment]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The CPU hotplug handlers are called twice: at prepare and online stage.
Their role is to:
1) Enable/disable a CPU context. This is irrelevant and even buggy at
the prepare stage because the CPU is still offline. On early
secondary CPU up, creating an event attached to that CPU might
silently fail because the CPU context is observed as online but the
context installation's IPI failure is ignored.
2) Update the scope cpumasks and re-migrate the events accordingly in
the CPU down case. This is irrelevant at the prepare stage.
3) Remove the events attached to the context of the offlining CPU. It
even uses an (unnecessary) IPI for it. This is also irrelevant at the
prepare stage.
Also none of the *_PREPARE and *_STARTING architecture perf related CPU
hotplug callbacks rely on CPUHP_PERF_PREPARE.
CPUHP_AP_PERF_ONLINE is enough and the right place to perform the work.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250424161128.29176-4-frederic@kernel.org
|
|
Since commit 2070887fdeac ("futex: fix restart in wait_requeue_pi"),
futex_wait_requeue_pi() no longer uses restart_block. Update the comment
accordingly.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250428193445.4571-1-namcao@linutronix.de
|
|
Move this API to the canonical timer_*() namespace.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250507175338.672442-10-mingo@kernel.org
|
|
Move this API to the canonical timer_*() namespace.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250507175338.672442-9-mingo@kernel.org
|