| Age | Commit message (Collapse) | Author |
|
Rework the vast majority of KVM's exports to expose symbols only to KVM
submodules, i.e. to x86's kvm-{amd,intel}.ko and PPC's kvm-{pr,hv}.ko.
With few exceptions, KVM's exported APIs are intended (and safe) for KVM-
internal usage only.
Keep kvm_get_kvm(), kvm_get_kvm_safe(), and kvm_put_kvm() as normal
exports, as they are needed by VFIO, and are generally safe for external
usage (though ideally even the get/put APIs would be KVM-internal, and
VFIO would pin a VM by grabbing a reference to its associated file).
Implement a framework in kvm_types.h in anticipation of providing a macro
to restrict KVM-specific kernel exports, i.e. to provide symbol exports
for KVM if and only if KVM is built as one or more modules.
Link: https://lore.kernel.org/r/20250919003303.1355064-3-seanjc@google.com
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Make migrate_{en,dis}able() inline, to improve performance
(Menglong Dong)
- Move STDL_INIT() functions out-of-line (Peter Zijlstra)
- Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)
Fair scheduling:
- Defer throttling to when tasks exit to user-space, to reduce the
chance & impact of throttle-preemption with held locks and other
resources (Aaron Lu, Valentin Schneider)
- Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
warning was getting triggered on certain topologies (Peter
Zijlstra)
Misc cleanups & fixes:
- Header cleanups (Menglong Dong)
- Fix race in push_dl_task() (Harshit Agarwal)"
* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix some typos in include/linux/preempt.h
sched: Make migrate_{en,dis}able() inline
rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
sched/fair: Do not balance task to a throttled cfs_rq
sched/fair: Do not special case tasks in throttled hierarchy
sched/fair: update_cfs_group() for throttled cfs_rqs
sched/fair: Propagate load for throttled cfs_rq
sched/fair: Get rid of throttled_lb_pair()
sched/fair: Task based throttle time accounting
sched/fair: Switch to task based throttle model
sched/fair: Implement throttle task work and related helpers
sched/fair: Add related data structure for task based throttle
sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
sched: Move STDL_INIT() functions out-of-line
sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
sched/deadline: Fix race in push_dl_task()
|
|
HEAD
KVM SEV-SNP CipherText Hiding support for 6.18
Add support for SEV-SNP's CipherText Hiding, an opt-in feature that prevents
unauthorized CPU accesses from reading the ciphertext of SNP guest private
memory, e.g. to attempt an offline attack. Instead of ciphertext, the CPU
will always read back all FFs when CipherText Hiding is enabled.
Add new module parameter to the KVM module to enable CipherText Hiding and
control the number of ASIDs that can be used for VMs with CipherText Hiding,
which is in effect the number of SNP VMs. When CipherText Hiding is enabled,
the shared SEV-ES/SEV-SNP ASID space is split into separate ranges for SEV-ES
and SEV-SNP guests, i.e. ASIDs that can be used for CipherText Hiding cannot
be used to run SEV-ES guests.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.18
1. Add PTW feature detection on new hardware.
2. Add sign extension with kernel MMIO/IOCSR emulation.
3. Improve in-kernel IPI emulation.
4. Improve in-kernel PCH-PIC emulation.
5. Move kvm_iocsr tracepoint out of generic code.
|
|
KVM/riscv changes for 6.18
- Added SBI FWFT extension for Guest/VM with misaligned
delegation and pointer masking PMLEN features
- Added ONE_REG interface for SBI FWFT extension
- Added Zicbop and bfloat16 extensions for Guest/VM
- Enabled more common KVM selftests for RISC-V such as
access_tracking_perf_test, dirty_log_perf_test,
memslot_modification_stress_test, memslot_perf_test,
mmu_stress_test, and rseq_test
- Added SBI v3.0 PMU enhancements in KVM and perf driver
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.18
- Add support for FF-A 1.2 as the secure memory conduit for pKVM,
allowing more registers to be used as part of the message payload.
- Change the way pKVM allocates its VM handles, making sure that the
privileged hypervisor is never tricked into using uninitialised
data.
- Speed up MMIO range registration by avoiding unnecessary RCU
synchronisation, which results in VMs starting much quicker.
- Add the dump of the instruction stream when panic-ing in the EL2
payload, just like the rest of the kernel has always done. This will
hopefully help debugging non-VHE setups.
- Add 52bit PA support to the stage-1 page-table walker, and make use
of it to populate the fault level reported to the guest on failing
to translate a stage-1 walk.
- Add NV support to the GICv3-on-GICv5 emulation code, ensuring
feature parity for guests, irrespective of the host platform.
- Fix some really ugly architecture problems when dealing with debug
in a nested VM. This has some bad performance impacts, but is at
least correct.
- Add enough infrastructure to be able to disable EL2 features and
give effective values to the EL2 control registers. This then allows
a bunch of features to be turned off, which helps cross-host
migration.
- Large rework of the selftest infrastructure to allow most tests to
transparently run at EL2. This is the first step towards enabling
NV testing.
- Various fixes and improvements all over the map, including one BE
fix, just in time for the removal of the feature.
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 changes for 6.17, round #3
- Invalidate nested MMUs upon freeing the PGD to avoid WARNs when
visiting from an MMU notifier
- Fixes to the TLB match process and TLB invalidation range for
managing the VCNR pseudo-TLB
- Prevent SPE from erroneously profiling guests due to UNKNOWN reset
values in PMSCR_EL1
- Fix save/restore of host MDCR_EL2 to account for eagerly programming
at vcpu_load() on VHE systems
- Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios
where an xarray's spinlock was nested with a *raw* spinlock
- Permit stage-2 read permission aborts which are possible in the case
of NV depending on the guest hypervisor's stage-2 translation
- Call raw_spin_unlock() instead of the internal spinlock API
- Fix parameter ordering when assigning VBAR_EL1
[Pull into kvm/master to fix conflicts. - Paolo]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Extensive cpuset code cleanup and refactoring work with no functional
changes: CPU mask computation logic refactoring, introducing new
helpers, removing redundant code paths, and improving error handling
for better maintainability.
- A few bug fixes to cpuset including fixes for partition creation
failures when isolcpus is in use, missing error returns, and null
pointer access prevention in free_tmpmasks().
- Core cgroup changes include replacing the global percpu_rwsem with
per-threadgroup rwsem when writing to cgroup.procs for better
scalability, workqueue conversions to use WQ_PERCPU and
system_percpu_wq to prepare for workqueue default switching from
percpu to unbound, and removal of unused code including the
post_attach callback.
- New cgroup.stat.local time accounting feature that tracks frozen time
duration.
- Misc changes including selftests updates (new freezer time tests and
backward compatibility fixes), documentation sync, string function
safety improvements, and 64-bit division fixes.
* tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits)
cpuset: remove is_prs_invalid helper
cpuset: remove impossible warning in update_parent_effective_cpumask
cpuset: remove redundant special case for null input in node mask update
cpuset: fix missing error return in update_cpumask
cpuset: Use new excpus for nocpu error check when enabling root partition
cpuset: fix failure to enable isolated partition when containing isolcpus
Documentation: cgroup-v2: Sync manual toctree
cpuset: use partition_cpus_change for setting exclusive cpus
cpuset: use parse_cpulist for setting cpus.exclusive
cpuset: introduce partition_cpus_change
cpuset: refactor cpus_allowed_validate_change
cpuset: refactor out validate_partition
cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers
cpuset: refactor CPU mask buffer parsing logic
cpuset: Refactor exclusive CPU mask computation logic
cpuset: change return type of is_partition_[in]valid to bool
cpuset: remove unused assignment to trialcs->partition_root_state
cpuset: move the root cpuset write check earlier
cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock
cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock
...
|
|
Pull workqueue updates from Tejun Heo:
- WQ_PERCPU was added to remaining alloc_workqueue() users and
system_wq usage was replaced with system_percpu_wq and
system_unbound_wq with system_dfl_wq.
These are equivalent conversions with no functional changes,
preparing for switching default to unbound workqueues from percpu.
- A handshake mechanism was added for canceling BH workers to avoid
live lock scenarios under PREEMPT_RT.
- Unnecessary rcu_read_lock/unlock() calls were dropped in
wq_watchdog_timer_fn() and workqueue_congested().
- Documentation was fixed to resolve texinfodocs warnings.
* tag 'wq-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix texinfodocs warning for WQ_* flags reference
workqueue: WQ_PERCPU added to alloc_workqueue users
workqueue: replace use of system_wq with system_percpu_wq
workqueue: replace use of system_unbound_wq with system_dfl_wq
workqueue: Provide a handshake for canceling BH workers
workqueue: Remove rcu_read_lock/unlock() in wq_watchdog_timer_fn()
workqueue: Remove redundant rcu_read_lock/unlock() in workqueue_congested()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Code organization cleanup. Separate internal types and accessors to
ext_internal.h to reduce the size of ext.c and improve
maintainability.
- Prepare for cgroup sub-scheduler support by adding @sch parameter to
various functions and helpers, reorganizing scheduler instance
handling, and dropping obsolete helpers like scx_kf_exit() and
kf_cpu_valid().
- Add new scx_bpf_cpu_curr() and scx_bpf_locked_rq() BPF helpers to
provide safer access patterns with proper RCU protection.
scx_bpf_cpu_rq() is deprecated with warnings due to potential race
conditions.
- Improve debugging with migration-disabled counter in error state
dumps, SCX_EFLAG_INITIALIZED flag, bitfields for warning flags, and
other enhancements to help diagnose issues.
- Use cgroup_lock/unlock() for cgroup synchronization instead of
scx_cgroup_rwsem based synchronization. This is simpler and allows
enable/disable paths to synchronize against cgroup changes
independent of the CPU controller.
- rhashtable_lookup() replacement to avoid redundant RCU locking was
reverted due to RCU usage warnings. Will be redone once rhashtable is
updated to use rcu_dereference_all().
- Other misc updates and fixes including bypass handling improvements,
scx_task_iter_relock() improvements, tools/sched_ext updates, and
compatibility helpers.
* tag 'sched_ext-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (28 commits)
Revert "sched_ext: Use rhashtable_lookup() instead of rhashtable_lookup_fast()"
sched_ext: Misc updates around scx_sched instance pointer
sched_ext: Drop scx_kf_exit() and scx_kf_error()
sched_ext: Add the @sch parameter to scx_dsq_insert_preamble/commit()
sched_ext: Drop kf_cpu_valid()
sched_ext: Add the @sch parameter to ext_idle helpers
sched_ext: Add the @sch parameter to __bstr_format()
sched_ext: Separate out scx_kick_cpu() and add @sch to it
tools/sched_ext: scx_qmap: Make debug output quieter by default
sched_ext: Make qmap dump operation non-destructive
sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()
sched_ext: Use bitfields for boolean warning flags
sched_ext: Fix stray scx_root usage in task_can_run_on_remote_rq()
sched_ext: Improve SCX_KF_DISPATCH comment
sched_ext: Use rhashtable_lookup() instead of rhashtable_lookup_fast()
sched_ext: Verify RCU protection in scx_bpf_cpu_curr()
sched_ext: Add migration-disabled counter to error state dump
sched_ext: Fix NULL dereference in scx_bpf_cpu_rq() warning
tools/sched_ext: Add compat helper for scx_bpf_cpu_curr()
sched_ext: deprecation warn for scx_bpf_cpu_rq()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Move the management of the LSM BPF security blobs into the framework
In order to enable multiple LSMs we need to allocate and free the
various security blobs in the LSM framework and not the individual
LSMs as they would end up stepping all over each other.
- Leverage the lsm_bdev_alloc() helper in lsm_bdev_alloc()
Make better use of our existing helper functions to reduce some code
duplication.
- Update the Rust cred code to use 'sync::aref'
Part of a larger effort to move the Rust code over to the 'sync'
module.
- Make CONFIG_LSM dependent on CONFIG_SECURITY
As the CONFIG_LSM Kconfig setting is an ordered list of the LSMs to
enable a boot, it obviously doesn't make much sense to enable this
when CONFIG_SECURITY is disabled.
- Update the LSM and CREDENTIALS sections in MAINTAINERS with Rusty
bits
Add the Rust helper files to the associated LSM and CREDENTIALS
entries int the MAINTAINERS file. We're trying to improve the
communication between the two groups and making sure we're all aware
of what is going on via cross-posting to the relevant lists is a good
way to start.
* tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: CONFIG_LSM can depend on CONFIG_SECURITY
MAINTAINERS: add the associated Rust helper to the CREDENTIALS section
MAINTAINERS: add the associated Rust helper to the LSM section
rust,cred: update AlwaysRefCounted import to sync::aref
security: use umax() to improve code
lsm,selinux: Add LSM blob support for BPF objects
lsm: use lsm_blob_alloc() in lsm_bdev_alloc()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Proper audit support for multiple LSMs
As the audit subsystem predated the work to enable multiple LSMs,
some additional work was needed to support logging the different LSM
labels for the subjects/tasks and objects on the system. Casey's
patches add new auxillary records for subjects and objects that
convey the additional labels.
- Ensure fanotify audit events are always generated
Generally speaking security relevant subsystems always generate audit
events, unless explicitly ignored. However, up to this point fanotify
events had been ignored by default, but starting with this pull
request fanotify follows convention and generates audit events by
default.
- Replace an instance of strcpy() with strscpy()
- Minor indentation, style, and comment fixes
* tag 'audit-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix skb leak when audit rate limit is exceeded
audit: init ab->skb_list earlier in audit_buffer_alloc()
audit: add record for multiple object contexts
audit: add record for multiple task security contexts
lsm: security_lsmblob_to_secctx module selection
audit: create audit_stamp structure
audit: add a missing tab
audit: record fanotify event regardless of presence of rules
audit: fix typo in auditfilter.c comment
audit: Replace deprecated strcpy() with strscpy()
audit: fix indentation in audit_log_exit()
|
|
- Implement haptic touchpad support (Angela Czubak and Jonathan Denose)
|
|
Instead of sharing sd->defer_list & sd->defer_count with
many cpus, add one pair for each NUMA node.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Get rid of sd->defer_lock and adopt llist operations.
We optimize skb_attempt_defer_free() for the common case,
where the packet is queued. Otherwise sd->defer_count
is increasing, until skb_defer_free_flush() clears it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This is preparation work to remove the softnet_data.defer_lock,
as it is contended on hosts with large number of cores.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
After 42e2a9e11a1d ("net: phy: dp83640: improve phydev and driver
removal handling") we can stop exporting also phy_driver_unregister().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/2bab950e-4b70-4030-b997-03f48379586f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Introduce a new document, flow_control.rst, to provide a comprehensive
guide on Ethernet Flow Control in Linux. The guide explains how flow
control works, how autonegotiation resolves pause capabilities, and how
to configure it using ethtool and Netlink.
In parallel, document the pause and pause-stat attributes in the
ethtool.yaml netlink spec. This enables the ynl tool to generate
kernel-doc comments for the corresponding enums in the UAPI header,
making the C interface self-documenting.
Finally, replace the legacy flow control section in phy.rst with a
reference to the new document and add pointers in the relevant C source
files.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan:
- powerpc support for BPF arena and arena atomics
- Patches to switch to msi parent domain (per-device MSI domains)
- Add a lock contention tracepoint in the queued spinlock slowpath
- Fixes for underflow in pseries/powernv msi and pci paths
- Switch from legacy-of-mm-gpiochip dependency to platform driver
- Fixes for handling TLB misses
- Introduce support for powerpc papr-hvpipe
- Add vpa-dtl PMU driver for pseries platform
- Misc fixes and cleanups
Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira
Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam
Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence,
Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao,
Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters,
Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas
Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao
Bagalkote.
* tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits)
powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct
genirq/msi: Remove msi_post_free()
powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU
powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed
powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer
powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data
docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu
powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf
powerpc/time: Expose boot_tb via accessor
powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
powerpc/fprobe: fix updated fprobe for function-graph tracer
powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL
powerpc64/modules: replace stub allocation sentinel with an explicit counter
powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs
powerpc/ftrace: ensure ftrace record ops are always set for NOPs
powerpc/603: Really copy kernel PGD entries into all PGDIRs
powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler
powerpc/pseries: HVPIPE changes to support migration
powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS
powerpc/pseries: Enable HVPIPE event message interrupt
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Refactor SCLP memory hotplug code
- Introduce common boot_panic() decompressor helper macro and use it to
get rid of nearly few identical implementations
- Take into account additional key generation flags and forward it to
the ep11 implementation. With that allow users to modify the key
generation process, e.g. provide valid combinations of XCP_BLOB_*
flags
- Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
debug facility and HMC driver
- Add DAX support for DCSS memory block devices
- Make the compiler statement attribute "assume" available with a new
__assume macro
- Rework ffs() and fls() family bitops functions, including source code
improvements and generated code optimizations. Use the newly
introduced __assume macro for that
- Enable additional network features in default configurations
- Use __GFP_ACCOUNT flag for user page table allocations to add missing
kmemcg accounting
- Add WQ_PERCPU flag to explicitly request the use of the per-CPU
workqueue for 3590 tape driver
- Switch power reading to the per-CPU and the Hiperdispatch to the
default workqueue
- Add memory allocation profiling hooks to allow better profiling data
and the /proc/allocinfo output similar to other architectures
* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/mm: Add memory allocation profiling hooks
s390: Replace use of system_wq with system_dfl_wq
s390/diag324: Replace use of system_wq with system_percpu_wq
s390/tape: Add WQ_PERCPU to alloc_workqueue users
s390/bitops: Switch to generic ffs() if supported by compiler
s390/bitops: Switch to generic fls(), fls64(), etc.
s390/mm: Use __GFP_ACCOUNT for user page table allocations
s390/configs: Enable additional network features
s390/bitops: Cleanup __flogr()
s390/bitops: Use __assume() for __flogr() inline assembly return value
compiler_types: Add __assume macro
s390/bitops: Limit return value range of __flogr()
s390/dcssblk: Add DAX support
s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
s390/pkey: Forward keygenflags to ep11_unwrapkey
s390/boot: Add common boot_panic() code
s390/bitops: Optimize inlining
s390/bitops: Slightly optimize ffs() and fls64()
s390/sclp: Move memory hotplug code for better modularity
...
|
|
Add new callback operations for a dpll device:
- phase_offset_avg_factor_get(...) - to obtain current phase offset
averaging factor from dpll device,
- phase_offset_avg_factor_set(...) - to set phase offset averaging factor
Obtain the factor value using the get callback and provide it to the user
if the device driver implement this callback. Execute the set callback upon
user requests, if the driver implement it.
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
v2:
* do not require 'set' callback to retrieve current value
* always call 'set' callback regardless of current value
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250927084912.2343597-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Tariq Toukan says:
====================
mlx5-next updates 2025-09-28
* tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5: IFC add balance ID and LAG per MP group bits
net/mlx5: Add IFC bit for TIR/SQ order capability
====================
Link: https://patch.msgid.link/1759093989-841873-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's good stuff across the board, including some nice mm
improvements for CPUs with the 'noabort' BBML2 feature and a clever
patch to allow ptdump to play nicely with block mappings in the
vmalloc area.
Confidential computing:
- Add support for accepting secrets from firmware (e.g. ACPI CCEL)
and mapping them with appropriate attributes.
CPU features:
- Advertise atomic floating-point instructions to userspace
- Extend Spectre workarounds to cover additional Arm CPU variants
- Extend list of CPUs that support break-before-make level 2 and
guarantee not to generate TLB conflict aborts for changes of
mapping granularity (BBML2_NOABORT)
- Add GCS support to our uprobes implementation.
Documentation:
- Remove bogus SME documentation concerning register state when
entering/exiting streaming mode.
Entry code:
- Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY)
- Micro-optimise syscall entry path with a compiler branch hint.
Memory management:
- Enable huge mappings in vmalloc space even when kernel page-table
dumping is enabled
- Tidy up the types used in our early MMU setup code
- Rework rodata= for closer parity with the behaviour on x86
- For CPUs implementing BBML2_NOABORT, utilise block mappings in the
linear map even when rodata= applies to virtual aliases
- Don't re-allocate the virtual region between '_text' and '_stext',
as doing so confused tools parsing /proc/vmcore.
Miscellaneous:
- Clean-up Kconfig menuconfig text for architecture features
- Avoid redundant bitmap_empty() during determination of supported
SME vector lengths
- Re-enable warnings when building the 32-bit vDSO object
- Avoid breaking our eggs at the wrong end.
Perf and PMUs:
- Support for v3 of the Hisilicon L3C PMU
- Support for Hisilicon's MN and NoC PMUs
- Support for Fujitsu's Uncore PMU
- Support for SPE's extended event filtering feature
- Preparatory work to enable data source filtering in SPE
- Support for multiple lanes in the DWC PCIe PMU
- Support for i.MX94 in the IMX DDR PMU driver
- MAINTAINERS update (Thank you, Yicong)
- Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets).
Selftests:
- Add basic LSFE check to the existing hwcaps test
- Support nolibc in GCS tests
- Extend SVE ptrace test to pass unsupported regsets and invalid
vector lengths
- Minor cleanups (typos, cosmetic changes).
System registers:
- Fix ID_PFR1_EL1 definition
- Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1
- Sync TCR_EL1 definition with the latest Arm ARM (L.b)
- Be stricter about the input fed into our AWK sysreg generator
script
- Typo fixes and removal of redundant definitions.
ACPI, EFI and PSCI:
- Decouple Arm's "Software Delegated Exception Interface" (SDEI)
support from the ACPI GHES code so that it can be used by platforms
booted with device-tree
- Remove unnecessary per-CPU tracking of the FPSIMD state across EFI
runtime calls
- Fix a node refcount imbalance in the PSCI device-tree code.
CPU Features:
- Ensure register sanitisation is applied to fields in ID_AA64MMFR4
- Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM
guests can reliably query the underlying CPU types from the VMM
- Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes
to our context-switching, signal handling and ptrace code"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits)
arm64: cpufeature: Remove duplicate asm/mmu.h header
arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN
perf/dwc_pcie: Fix use of uninitialized variable
arm/syscalls: mark syscall invocation as likely in invoke_syscall
Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU
Documentation: hisi-pmu: Fix of minor format error
drivers/perf: hisi: Add support for L3C PMU v3
drivers/perf: hisi: Refactor the event configuration of L3C PMU
drivers/perf: hisi: Extend the field of tt_core
drivers/perf: hisi: Extract the event filter check of L3C PMU
drivers/perf: hisi: Simplify the probe process of each L3C PMU version
drivers/perf: hisi: Export hisi_uncore_pmu_isr()
drivers/perf: hisi: Relax the event ID check in the framework
perf: Fujitsu: Add the Uncore PMU driver
arm64: map [_text, _stext) virtual address range non-executable+read-only
arm64/sysreg: Update TCR_EL1 register
arm64: Enable vmalloc-huge with ptdump
arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list
arm64: errata: Apply workarounds for Neoverse-V3AE
arm64: cputype: Add Neoverse-V3AE definitions
...
|
|
__ptr_ring_zero_tail currently does the - 1 operation twice:
- during initialization of head
- at each loop iteration
Let's just do it in one place, all we need to do
is adjust the loop condition. this is better:
- a slightly clearer logic with less duplication
- uses prefix -- we don't need to save the old value
- one less - 1 operation - for example, when ring is empty
we now don't do - 1 at all, existing code does it once
Text size shrinks from 15081 to 15050 bytes.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/bcd630c7edc628e20d4f8e037341f26c90ab4365.1758976026.git.mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- binfmt_elf: preserve original ELF e_flags for core dumps (Svetlana
Parfenova)
- exec: Fix incorrect type for ret (Xichao Zhao)
- binfmt_elf: Replace offsetof() with struct_size() in fill_note_info()
(Xichao Zhao)
* tag 'execve-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_elf: preserve original ELF e_flags for core dumps
binfmt_elf: Replace offsetof() with struct_size() in fill_note_info()
exec: Fix incorrect type for ret
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull ffs const-attribute cleanups from Kees Cook:
"While working on various hardening refactoring a while back we
encountered inconsistencies in the application of __attribute_const__
on the ffs() family of functions.
This series fixes this across all archs and adds KUnit tests.
Notably, this found a theoretical underflow in PCI (also fixed here)
and uncovered an inefficiency in ARC (fixed in the ARC arch PR). I
kept the series separate from the general hardening PR since it is a
stand-alone "topic".
- PCI: Fix theoretical underflow in use of ffs().
- Universally apply __attribute_const__ to all architecture's
ffs()-family of functions.
- Add KUnit tests for ffs() behavior and const-ness"
* tag 'ffs-const-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
KUnit: ffs: Validate all the __attribute_const__ annotations
sparc: Add __attribute_const__ to ffs()-family implementations
xtensa: Add __attribute_const__ to ffs()-family implementations
s390: Add __attribute_const__ to ffs()-family implementations
parisc: Add __attribute_const__ to ffs()-family implementations
mips: Add __attribute_const__ to ffs()-family implementations
m68k: Add __attribute_const__ to ffs()-family implementations
openrisc: Add __attribute_const__ to ffs()-family implementations
riscv: Add __attribute_const__ to ffs()-family implementations
hexagon: Add __attribute_const__ to ffs()-family implementations
alpha: Add __attribute_const__ to ffs()-family implementations
sh: Add __attribute_const__ to ffs()-family implementations
powerpc: Add __attribute_const__ to ffs()-family implementations
x86: Add __attribute_const__ to ffs()-family implementations
csky: Add __attribute_const__ to ffs()-family implementations
bitops: Add __attribute_const__ to generic ffs()-family implementations
KUnit: Introduce ffs()-family tests
PCI: Test for bit underflow in pcie_set_readrq()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This adds a dlm_release_lockspace() flag to request that node-failure
recovery be performed for the node leaving the lockspace.
The implementation of this flag requires coordination with userland
clustering components. It's been requested for use by GFS2"
* tag 'dlm-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: check for undefined release_option values
dlm: handle release_option as unsigned
dlm: move to rinfo for all middle conversion cases
dlm: handle invalid lockspace member remove
dlm: add new flag DLM_RELEASE_RECOVER for dlm_lockspace_release
dlm: add new configfs entry release_recover for lockspace members
dlm: add new RELEASE_RECOVER uevent attribute for release_lockspace
dlm: use defines for force values in dlm_release_lockspace
dlm: check for defined force value in dlm_lockspace_release
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs updates from Viacheslav Dubeyko:
"This contains several fixes of syzbot reported issues, HFS/HFS+ fixes
of xfstests failures, and rework of HFS/HFS+ debug output subsystem.
- Kang Chen fixed a slab-out-of-bounds issue in hfsplus_uni2asc()
when hfsplus_uni2asc() is called from hfsplus_listxattr().
- Yang Chenzhi fixed a crash in hfsplus_bmap_alloc() if record offset
or length is larger than node_size.
- Yangtao Li corrected the error code from hfsplus_fill_super() if
Catalog File contains corrupted record for the case of hidden
directory's type.
- KMSAN uninit-value fixes: hfs_find_set_zero_bits() and
__hfsplus_ext_cache_extent() use kzalloc() instead of kmalloc(),
and in hfsplus_delete_cat() by proper initialization of struct
hfsplus_inode_info in the hfsplus_iget() logic.
- A slab-out-of-bounds issue could happen in hfsplus_strcasecmp() if
the length field of struct hfsplus_unistr is bigger than
HFSPLUS_MAX_STRLEN. Fixed by checking the length of comparing
strings, and if the strings' length is bigger than
HFSPLUS_MAX_STRLEN, then the length is corrected to this value.
- The generic/736 xfstest failed for HFS because the HFS volume
becomes corrupted after the test run.
The main reason was the absence of logic that corrects
mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused CNID) after
deleting a record in Catalog File. That was fixed by implementing
the necessary logic in hfs_correct_next_unused_CNID()"
* tag 'hfs-v6.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs/hfsplus: rework debug output subsystem
hfsplus: fix slab-out-of-bounds read in hfsplus_strcasecmp()
hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc()
hfs: clear offset and space out of valid records in b-tree node
hfs: add logic of correcting a next unused CNID
hfsplus: fix KMSAN uninit-value issue in hfsplus_delete_cat()
hfs: fix KMSAN uninit-value issue in hfs_find_set_zero_bits()
hfs: make proper initalization of struct hfs_find_data
hfsplus: fix KMSAN uninit-value issue in __hfsplus_ext_cache_extent()
hfs: validate record offset in hfsplus_bmap_alloc
hfsplus: return EIO when type of hidden directory mismatch in hfsplus_fill_super()
MAINTAINERS: update location of hfs&hfsplus trees
|
|
Pull xfs updates from Carlos Maiolino:
"For this merge window, there are really no new features, but there are
a few things worth to emphasize:
- Deprecated for years already, the (no)attr2 and (no)ikeep mount
options have been removed for good
- Several cleanups (specially from typedefs) and bug fixes
- Improvements made in the online repair reap calculations
- online fsck is now enabled by default"
* tag 'xfs-merge-6.18' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits)
xfs: rework datasync tracking and execution
xfs: rearrange code in xfs_inode_item_precommit
xfs: scrub: use kstrdup_const() for metapath scan setups
xfs: use bt_nr_sectors in xfs_dax_translate_range
xfs: track the number of blocks in each buftarg
xfs: constify xfs_errortag_random_default
xfs: improve default maximum number of open zones
xfs: improve zone statistics message
xfs: centralize error tag definitions
xfs: remove pointless externs in xfs_error.h
xfs: remove the expr argument to XFS_TEST_ERROR
xfs: remove xfs_errortag_set
xfs: remove xfs_errortag_get
xfs: move the XLOG_REG_ constants out of xfs_log_format.h
xfs: adjust the hint based zone allocation policy
xfs: refactor hint based zone allocation
fs: add an enum for number of life time hints
xfs: fix log CRC mismatches between i386 and other architectures
xfs: rename the old_crc variable in xlog_recover_process
xfs: remove the unused xfs_log_iovec_t typedef
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async directory updates from Christian Brauner:
"This contains further preparatory changes for the asynchronous directory
locking scheme:
- Add lookup_one_positive_killable() which allows overlayfs to
perform lookup that won't block on a fatal signal
- Unify the mount idmap handling in struct renamedata as a rename can
only happen within a single mount
- Introduce kern_path_parent() for audit which sets the path to the
parent and returns a dentry for the target without holding any
locks on return
- Rename kern_path_locked() as it is only used to prepare for the
removal of an object from the filesystem:
kern_path_locked() => start_removing_path()
kern_path_create() => start_creating_path()
user_path_create() => start_creating_user_path()
user_path_locked_at() => start_removing_user_path_at()
done_path_create() => end_creating_path()
NA => end_removing_path()"
* tag 'vfs-6.18-rc1.async' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
debugfs: rename start_creating() to debugfs_start_creating()
VFS: rename kern_path_locked() and related functions.
VFS/audit: introduce kern_path_parent() for audit
VFS: unify old_mnt_idmap and new_mnt_idmap in renamedata
VFS: discard err2 in filename_create()
VFS/ovl: add lookup_one_positive_killable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs writeback updates from Christian Brauner:
"This contains work adressing lockups reported by users when a systemd
unit reading lots of files from a filesystem mounted with the lazytime
mount option exits.
With the lazytime mount option enabled we can be switching many dirty
inodes on cgroup exit to the parent cgroup. The numbers observed in
practice when systemd slice of a large cron job exits can easily reach
hundreds of thousands or millions.
The logic in inode_do_switch_wbs() which sorts the inode into
appropriate place in b_dirty list of the target wb however has linear
complexity in the number of dirty inodes thus overall time complexity
of switching all the inodes is quadratic leading to workers being
pegged for hours consuming 100% of the CPU and switching inodes to the
parent wb.
Simple reproducer of the issue:
FILES=10000
# Filesystem mounted with lazytime mount option
MNT=/mnt/
echo "Creating files and switching timestamps"
for (( j = 0; j < 50; j ++ )); do
mkdir $MNT/dir$j
for (( i = 0; i < $FILES; i++ )); do
echo "foo" >$MNT/dir$j/file$i
done
touch -a -t 202501010000 $MNT/dir$j/file*
done
wait
echo "Syncing and flushing"
sync
echo 3 >/proc/sys/vm/drop_caches
echo "Reading all files from a cgroup"
mkdir /sys/fs/cgroup/unified/mycg1 || exit
echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit
for (( j = 0; j < 50; j ++ )); do
cat /mnt/dir$j/file* >/dev/null &
done
wait
echo "Switching wbs"
# Now rmdir the cgroup after the script exits
This can be solved by:
- Avoiding contention on the wb->list_lock when switching inodes by
running a single work item per wb and managing a queue of items
switching to the wb
- Allowing rescheduling when switching inodes over to a different
cgroup to avoid softlockups
- Maintaining the b_dirty list ordering instead of sorting it"
* tag 'vfs-6.18-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
writeback: Add tracepoint to track pending inode switches
writeback: Avoid excessively long inode switching times
writeback: Avoid softlockup when switching many inodes
writeback: Avoid contention on wb->list_lock when switching inodes
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
"This contains a larger set of changes around the generic namespace
infrastructure of the kernel.
Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
ns_common which carries the reference count of the namespace and so
on.
We open-coded and cargo-culted so many quirks for each namespace type
that it just wasn't scalable anymore. So given there's a bunch of new
changes coming in that area I've started cleaning all of this up.
The core change is to make it possible to correctly initialize every
namespace uniformly and derive the correct initialization settings
from the type of the namespace such as namespace operations, namespace
type and so on. This leaves the new ns_common_init() function with a
single parameter which is the specific namespace type which derives
the correct parameters statically. This also means the compiler will
yell as soon as someone does something remotely fishy.
The ns_common_init() addition also allows us to remove ns_alloc_inum()
and drops any special-casing of the initial network namespace in the
network namespace initialization code that Linus complained about.
Another part is reworking the reference counting. The reference
counting was open-coded and copy-pasted for each namespace type even
though they all followed the same rules. This also removes all open
accesses to the reference count and makes it private and only uses a
very small set of dedicated helpers to manipulate them just like we do
for e.g., files.
In addition this generalizes the mount namespace iteration
infrastructure introduced a few cycles ago. As reminder, the vfs makes
it possible to iterate sequentially and bidirectionally through all
mount namespaces on the system or all mount namespaces that the caller
holds privilege over. This allow userspace to iterate over all mounts
in all mount namespaces using the listmount() and statmount() system
call.
Each mount namespace has a unique identifier for the lifetime of the
systems that is exposed to userspace. The network namespace also has a
unique identifier working exactly the same way. This extends the
concept to all other namespace types.
The new nstree type makes it possible to lookup namespaces purely by
their identifier and to walk the namespace list sequentially and
bidirectionally for all namespace types, allowing userspace to iterate
through all namespaces. Looking up namespaces in the namespace tree
works completely locklessly.
This also means we can move the mount namespace onto the generic
infrastructure and remove a bunch of code and members from struct
mnt_namespace itself.
There's a bunch of stuff coming on top of this in the future but for
now this uses the generic namespace tree to extend a concept
introduced first for pidfs a few cycles ago. For a while now we have
supported pidfs file handles for pidfds. This has proven to be very
useful.
This extends the concept to cover namespaces as well. It is possible
to encode and decode namespace file handles using the common
name_to_handle_at() and open_by_handle_at() apis.
As with pidfs file handles, namespace file handles are exhaustive,
meaning it is not required to actually hold a reference to nsfs in
able to decode aka open_by_handle_at() a namespace file handle.
Instead the FD_NSFS_ROOT constant can be passed which will let the
kernel grab a reference to the root of nsfs internally and thus decode
the file handle.
Namespaces file descriptors can already be derived from pidfds which
means they aren't subject to overmount protection bugs. IOW, it's
irrelevant if the caller would not have access to an appropriate
/proc/<pid>/ns/ directory as they could always just derive the
namespace based on a pidfd already.
It has the same advantage as pidfds. It's possible to reliably and for
the lifetime of the system refer to a namespace without pinning any
resources and to compare them trivially.
Permission checking is kept simple. If the caller is located in the
namespace the file handle refers to they are able to open it otherwise
they must hold privilege over the owning namespace of the relevant
namespace.
The namespace file handle layout is exposed as uapi and has a stable
and extensible format. For now it simply contains the namespace
identifier, the namespace type, and the inode number. The stable
format means that userspace may construct its own namespace file
handles without going through name_to_handle_at() as they are already
allowed for pidfs and cgroup file handles"
* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
ns: drop assert
ns: move ns type into struct ns_common
nstree: make struct ns_tree private
ns: add ns_debug()
ns: simplify ns_common_init() further
cgroup: add missing ns_common include
ns: use inode initializer for initial namespaces
selftests/namespaces: verify initial namespace inode numbers
ns: rename to __ns_ref
nsfs: port to ns_ref_*() helpers
net: port to ns_ref_*() helpers
uts: port to ns_ref_*() helpers
ipv4: use check_net()
net: use check_net()
net-sysfs: use check_net()
user: port to ns_ref_*() helpers
time: port to ns_ref_*() helpers
pid: port to ns_ref_*() helpers
ipc: port to ns_ref_*() helpers
cgroup: port to ns_ref_*() helpers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull copy_process updates from Christian Brauner:
"This contains the changes to enable support for clone3() on nios2
which apparently is still a thing.
The more exciting part of this is that it cleans up the inconsistency
in how the 64-bit flag argument is passed from copy_process() into the
various other copy_*() helpers"
[ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ]
* tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nios2: implement architecture-specific portion of sys_clone3
arch: copy_thread: pass clone_flags as u64
copy_process: pass clone_flags as u64 across calltree
copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs inode updates from Christian Brauner:
"This contains a series I originally wrote and that Eric brought over
the finish line. It moves out the i_crypt_info and i_verity_info
pointers out of 'struct inode' and into the fs-specific part of the
inode.
So now the few filesytems that actually make use of this pay the price
in their own private inode storage instead of forcing it upon every
user of struct inode.
The pointer for the crypt and verity info is simply found by storing
an offset to its address in struct fsverity_operations and struct
fscrypt_operations. This shrinks struct inode by 16 bytes.
I hope to move a lot more out of it in the future so that struct inode
becomes really just about very core stuff that we need, much like
struct dentry and struct file, instead of the dumping ground it has
become over the years.
On top of this are a various changes associated with the ongoing inode
lifetime handling rework that multiple people are pushing forward:
- Stop accessing inode->i_count directly in f2fs and gfs2. They
simply should use the __iget() and iput() helpers
- Make the i_state flags an enum
- Rework the iput() logic
Currently, if we are the last iput, and we have the I_DIRTY_TIME
bit set, we will grab a reference on the inode again and then mark
it dirty and then redo the put. This is to make sure we delay the
time update for as long as possible
We can rework this logic to simply dec i_count if it is not 1, and
if it is do the time update while still holding the i_count
reference
Then we can replace the atomic_dec_and_lock with locking the
->i_lock and doing atomic_dec_and_test, since we did the
atomic_add_unless above
- Add an icount_read() helper and convert everyone that accesses
inode->i_count directly for this purpose to use the helper
- Expand dump_inode() to dump more information about an inode helping
in debugging
- Add some might_sleep() annotations to iput() and associated
helpers"
* tag 'vfs-6.18-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add might_sleep() annotation to iput() and more
fs: expand dump_inode()
inode: fix whitespace issues
fs: add an icount_read helper
fs: rework iput logic
fs: make the i_state flags an enum
fs: stop accessing ->i_count directly in f2fs and gfs2
fsverity: check IS_VERITY() in fsverity_cleanup_inode()
fs: remove inode::i_verity_info
btrfs: move verity info pointer to fs-specific part of inode
f2fs: move verity info pointer to fs-specific part of inode
ext4: move verity info pointer to fs-specific part of inode
fsverity: add support for info in fs-specific part of inode
fs: remove inode::i_crypt_info
ceph: move crypt info pointer to fs-specific part of inode
ubifs: move crypt info pointer to fs-specific part of inode
f2fs: move crypt info pointer to fs-specific part of inode
ext4: move crypt info pointer to fs-specific part of inode
fscrypt: add support for info in fs-specific part of inode
fscrypt: replace raw loads of info pointer with helper function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains some work around mount api handling:
- Output the warning message for mnt_too_revealing() triggered during
fsmount() to the fscontext log. This makes it possible for the
mount tool to output appropriate warnings on the command line.
For example, with the newest fsopen()-based mount(8) from
util-linux, the error messages now look like:
# mount -t proc proc /tmp
mount: /tmp: fsmount() failed: VFS: Mount too revealing.
dmesg(1) may have more information after failed mount system call.
- Do not consume fscontext log entries when returning -EMSGSIZE
Userspace generally expects APIs that return -EMSGSIZE to allow for
them to adjust their buffer size and retry the operation.
However, the fscontext log would previously clear the message even
in the -EMSGSIZE case.
Given that it is very cheap for us to check whether the buffer is
too small before we remove the message from the ring buffer, let's
just do that instead.
- Drop an unused argument from do_remount()"
* tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
vfs: fs/namespace.c: remove ms_flags argument from do_remount
selftests/filesystems: add basic fscontext log tests
fscontext: do not consume log entries when returning -EMSGSIZE
vfs: output mount_too_revealing() errors to fscontext
docs/vfs: Remove mentions to the old mount API helpers
fscontext: add custom-prefix log helpers
fs: Remove mount_bdev
fs: Remove mount_nodev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual selections of misc updates for this cycle.
Features:
- Add "initramfs_options" parameter to set initramfs mount options.
This allows to add specific mount options to the rootfs to e.g.,
limit the memory size
- Add RWF_NOSIGNAL flag for pwritev2()
Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
signal from being raised when writing on disconnected pipes or
sockets. The flag is handled directly by the pipe filesystem and
converted to the existing MSG_NOSIGNAL flag for sockets
- Allow to pass pid namespace as procfs mount option
Ever since the introduction of pid namespaces, procfs has had very
implicit behaviour surrounding them (the pidns used by a procfs
mount is auto-selected based on the mounting process's active
pidns, and the pidns itself is basically hidden once the mount has
been constructed)
This implicit behaviour has historically meant that userspace was
required to do some special dances in order to configure the pidns
of a procfs mount as desired. Examples include:
* In order to bypass the mnt_too_revealing() check, Kubernetes
creates a procfs mount from an empty pidns so that user
namespaced containers can be nested (without this, the nested
containers would fail to mount procfs)
But this requires forking off a helper process because you cannot
just one-shot this using mount(2)
* Container runtimes in general need to fork into a container
before configuring its mounts, which can lead to security issues
in the case of shared-pidns containers (a privileged process in
the pidns can interact with your container runtime process)
While SUID_DUMP_DISABLE and user namespaces make this less of an
issue, the strict need for this due to a minor uAPI wart is kind
of unfortunate
Things would be much easier if there was a way for userspace to
just specify the pidns they want. So this pull request contains
changes to implement a new "pidns" argument which can be set
using fsconfig(2):
fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);
or classic mount(2) / mount(8):
// mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");
Cleanups:
- Remove the last references to EXPORT_OP_ASYNC_LOCK
- Make file_remove_privs_flags() static
- Remove redundant __GFP_NOWARN when GFP_NOWAIT is used
- Use try_cmpxchg() in start_dir_add()
- Use try_cmpxchg() in sb_init_done_wq()
- Replace offsetof() with struct_size() in ioctl_file_dedupe_range()
- Remove vfs_ioctl() export
- Replace rwlock() with spinlock in epoll code as rwlock causes
priority inversion on preempt rt kernels
- Make ns_entries in fs/proc/namespaces const
- Use a switch() statement() in init_special_inode() just like we do
in may_open()
- Use struct_size() in dir_add() in the initramfs code
- Use str_plural() in rd_load_image()
- Replace strcpy() with strscpy() in find_link()
- Rename generic_delete_inode() to inode_just_drop() and
generic_drop_inode() to inode_generic_drop()
- Remove unused arguments from fcntl_{g,s}et_rw_hint()
Fixes:
- Document @name parameter for name_contains_dotdot() helper
- Fix spelling mistake
- Always return zero from replace_fd() instead of the file descriptor
number
- Limit the size for copy_file_range() in compat mode to prevent a
signed overflow
- Fix debugfs mount options not being applied
- Verify the inode mode when loading it from disk in minixfs
- Verify the inode mode when loading it from disk in cramfs
- Don't trigger automounts with RESOLVE_NO_XDEV
If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
through automounts, but could still trigger them
- Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
tracepoints
- Fix unused variable warning in rd_load_image() on s390
- Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD
- Use ns_capable_noaudit() when determining net sysctl permissions
- Don't call path_put() under namespace semaphore in listmount() and
statmount()"
* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
fcntl: trim arguments
listmount: don't call path_put() under namespace semaphore
statmount: don't call path_put() under namespace semaphore
pid: use ns_capable_noaudit() when determining net sysctl permissions
fs: rename generic_delete_inode() and generic_drop_inode()
init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
initramfs: Replace strcpy() with strscpy() in find_link()
initrd: Use str_plural() in rd_load_image()
initramfs: Use struct_size() helper to improve dir_add()
initrd: Fix unused variable warning in rd_load_image() on s390
fs: use the switch statement in init_special_inode()
fs/proc/namespaces: make ns_entries const
filelock: add FL_RECLAIM to show_fl_flags() macro
eventpoll: Replace rwlock with spinlock
selftests/proc: add tests for new pidns APIs
procfs: add "pidns" mount option
pidns: move is-ancestor logic to helper
openat2: don't trigger automounts with RESOLVE_NO_XDEV
namei: move cross-device check to __traverse_mounts
namei: remove LOOKUP_NO_XDEV check from handle_mounts
...
|
|
The DEFINE_FREE() for pm_runtime_put has been superseded by recently
introduced runtime PM auto-cleanup macros and its only user has been
converted to using one of the new macros, so drop it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
|
|
It is generally useful to be able to automatically drop a device's
runtime PM usage counter incremented by runtime PM operations that
resume a device and bump up its usage counter [1].
To that end, add guard definition macros allowing pm_runtime_put()
and pm_runtime_put_autosuspend() to be used for the auto-cleanup in
those cases.
Simply put, a piece of code like below:
pm_runtime_get_sync(dev);
.....
pm_runtime_put(dev);
return 0;
can be transformed with guard() like:
guard(pm_runtime_active)(dev);
.....
return 0;
(see the pm_runtime_put() call is gone).
However, it is better to do proper error handling in the majority of
cases, so doing something like this instead of the above is recommended:
ACQUIRE(pm_runtime_active_try, pm)(dev);
if (ACQUIRE_ERR(pm_runtime_active_try, &pm))
return -ENXIO;
.....
return 0;
In all of the cases in which runtime PM is known to be enabled for the
given device or the device can be regarded as operational (and so it can
be accessed) with runtime PM disabled, a piece of code like:
ret = pm_runtime_resume_and_get(dev);
if (ret < 0)
return ret;
.....
pm_runtime_put(dev);
return 0;
can be changed as follows:
ACQUIRE(pm_runtime_active_try, pm)(dev);
ret = ACQUIRE_ERR(pm_runtime_active_try, &pm);
if (ret < 0)
return ret;
.....
return 0;
(again, see the pm_runtime_put() call is gone).
Still, if the device cannot be accessed unless runtime PM has been
enabled for it, the pm_runtime_active_try_enabled guard variant
needs to be used, that is (in the context of the example above):
ACQUIRE(pm_runtime_active_try_enabled, pm)(dev);
ret = ACQUIRE_ERR(pm_runtime_active_try_enabled, &pm);
if (ret < 0)
return ret;
.....
return 0;
When the original code calls pm_runtime_put_autosuspend(), use one
of the "auto" guard variants, pm_runtime_active_auto/_try/_enabled,
so for example, a piece of code like:
ret = pm_runtime_resume_and_get(dev);
if (ret < 0)
return ret;
.....
pm_runtime_put_autosuspend(dev);
return 0;
will become:
ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev);
ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm);
if (ret < 0)
return ret;
.....
return 0;
Note that the cases in which the return value of pm_runtime_get_sync()
is checked can also be handled with the help of the new guard macros.
For example, a piece of code like:
ret = pm_runtime_get_sync(dev);
if (ret < 0) {
pm_runtime_put(dev);
return ret;
}
.....
pm_runtime_put(dev);
return 0;
can be rewritten as:
ACQUIRE(pm_runtime_active_auto_try_enabled, pm)(dev);
ret = ACQUIRE_ERR(pm_runtime_active_auto_try_enabled, &pm);
if (ret < 0)
return ret;
.....
return 0;
or pm_runtime_get_active_try can be used if transparent handling of
disabled runtime PM is desirable.
Link: https://lore.kernel.org/linux-pm/878qimv24u.wl-tiwai@suse.de/ [1]
Link: https://lore.kernel.org/linux-pm/20250926150613.000073a4@huawei.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Takashi Iwai <tiwai@suse.de>
Link: https://patch.msgid.link/2238241.irdbgypaU6@rafael.j.wysocki
[ rjw: Fixed leftovers from the previous version in the changelog ]
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dhruva Gole <d-gole@ti.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Merge changes related to system sleep and runtime PM framework for
6.18-rc1:
- Annotate loops walking device links in the power management core
code as _srcu and add macros for walking device links to reduce the
likelihood of coding mistakes related to them (Rafael Wysocki)
- Document time units for *_time functions in the runtime PM API (Brian
Norris)
- Clear power.must_resume in noirq suspend error path to avoid resuming
a dependant device under a suspended parent or supplier (Rafael
Wysocki)
- Fix GFP mask handling during hybrid suspend and make the amdgpu
driver handle hybrid suspend correctly (Mario Limonciello, Rafael
Wysocki)
- Fix GFP mask handling after aborted hibernation in platform mode and
combine exit paths in power_down() to avoid code duplication (Rafael
Wysocki)
- Use vmalloc_array() and vcalloc() in the hibernation core to avoid
open-coded size computations (Qianfeng Rong)
- Fix typo in hibernation core code comment (Li Jun)
- Call pm_wakeup_clear() in the same place where other functions that do
bookkeeping prior to suspend_prepare() are called (Samuel Wu)
* pm-core:
PM: core: Add two macros for walking device links
PM: core: Annotate loops walking device links as _srcu
* pm-runtime:
PM: runtime: Documentation: ABI: Document time units for *_time
* pm-sleep:
PM: hibernate: Combine return paths in power_down()
PM: hibernate: Restrict GFP mask in power_down()
PM: hibernate: Fix pm_hibernation_mode_is_suspend() build breakage
drm/amd: Fix hybrid sleep
PM: hibernate: Add pm_hibernation_mode_is_suspend()
PM: hibernate: Fix hybrid-sleep
PM: sleep: core: Clear power.must_resume in noirq suspend error path
PM: sleep: Make pm_wakeup_clear() call more clear
PM: hibernate: Fix typo in memory bitmaps description comment
PM: hibernate: Use vmalloc_array() and vcalloc() to improve code
|
|
Merge energy model management, OPP (operating performance points) and
devfreq updates for 6.18-rc1:
- Prevent CPU capacity updates after registering a perf domain from
failing on a first CPU that is not present (Christian Loehle)
- Add support for the cases in which frequency alone is not sufficient
to uniquely identify an OPP (Krishna Chaitanya Chundru)
- Use to_result() for OPP error handling in Rust (Onur Özkan)
- Add support for LPDDR5 on Rockhip RK3588 SoC to rockchip-dfi devfreq
driver (Nicolas Frattaroli)
- Fix an issue where DDR cycle counts on RK3588/RK3528 with LPDDR4(X)
are reported as half by adding a cycle multiplier to the DFI driver
in rockchip-dfi devfreq-event driver (Nicolas Frattaroli)
- Fix missing error pointer dereference check of regulator instance in
the mtk-cci devfreq driver probe and remove a redundant condition from
an if () statement in that driver (Dan Carpenter, Liao Yuanhong)
* pm-em:
PM: EM: Fix late boot with holes in CPU topology
* pm-opp:
OPP: Add support to find OPP for a set of keys
rust: opp: use to_result for error handling
* pm-devfreq:
PM / devfreq: rockchip-dfi: add support for LPDDR5
PM / devfreq: rockchip-dfi: double count on RK3588
PM / devfreq: mtk-cci: avoid redundant conditions
PM / devfreq: mtk-cci: Fix potential error pointer dereference in probe()
|
|
From the cover letter [1]:
This patch set introduces kmalloc_nolock() which is the next logical
step towards any context allocation necessary to remove bpf_mem_alloc
and get rid of preallocation requirement in BPF infrastructure.
In production BPF maps grew to gigabytes in size. Preallocation wastes
memory. Alloc from any context addresses this issue for BPF and other
subsystems that are forced to preallocate too.
This long task started with introduction of alloc_pages_nolock(), then
memcg and objcg were converted to operate from any context including
NMI, this set completes the task with kmalloc_nolock() that builds on
top of alloc_pages_nolock() and memcg changes.
After that BPF subsystem will gradually adopt it everywhere.
Link: https://lore.kernel.org/all/20250909010007.1660-1-alexei.starovoitov@gmail.com/ [1]
|
|
kmalloc_nolock() relies on ability of local_trylock_t to detect
the situation when per-cpu kmem_cache is locked.
In !PREEMPT_RT local_(try)lock_irqsave(&s->cpu_slab->lock, flags)
disables IRQs and marks s->cpu_slab->lock as acquired.
local_lock_is_locked(&s->cpu_slab->lock) returns true when
slab is in the middle of manipulating per-cpu cache
of that specific kmem_cache.
kmalloc_nolock() can be called from any context and can re-enter
into ___slab_alloc():
kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf ->
kmalloc_nolock() -> ___slab_alloc(cache_B)
or
kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf ->
kmalloc_nolock() -> ___slab_alloc(cache_B)
Hence the caller of ___slab_alloc() checks if &s->cpu_slab->lock
can be acquired without a deadlock before invoking the function.
If that specific per-cpu kmem_cache is busy the kmalloc_nolock()
retries in a different kmalloc bucket. The second attempt will
likely succeed, since this cpu locked different kmem_cache.
Similarly, in PREEMPT_RT local_lock_is_locked() returns true when
per-cpu rt_spin_lock is locked by current _task_. In this case
re-entrance into the same kmalloc bucket is unsafe, and
kmalloc_nolock() tries a different bucket that is most likely is
not locked by the current task. Though it may be locked by a
different task it's safe to rt_spin_lock() and sleep on it.
Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL
immediately if called from hard irq or NMI in PREEMPT_RT.
kfree_nolock() defers freeing to irq_work when local_lock_is_locked()
and (in_nmi() or in PREEMPT_RT).
SLUB_TINY config doesn't use local_lock_is_locked() and relies on
spin_trylock_irqsave(&n->list_lock) to allocate,
while kfree_nolock() always defers to irq_work.
Note, kfree_nolock() must be called _only_ for objects allocated
with kmalloc_nolock(). Debug checks (like kmemleak and kfence)
were skipped on allocation, hence obj = kmalloc(); kfree_nolock(obj);
will miss kmemleak/kfence book keeping and will cause false positives.
large_kmalloc is not supported by either kmalloc_nolock()
or kfree_nolock().
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Since the combination of valid upper bits in slab->obj_exts with
OBJEXTS_ALLOC_FAIL bit can never happen,
use OBJEXTS_ALLOC_FAIL == (1ull << 0) as a magic sentinel
instead of (1ull << 2) to free up bit 2.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Change alloc_pages_nolock() to default to __GFP_COMP when allocating
pages, since upcoming reentrant alloc_slab_page() needs __GFP_COMP.
Also allow __GFP_ACCOUNT flag to be specified,
since most of BPF infra needs __GFP_ACCOUNT except BPF streams.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Introduce local_lock_is_locked() that returns true when
given local_lock is locked by current cpu (in !PREEMPT_RT) or
by current task (in PREEMPT_RT).
The goal is to detect a deadlock by the caller.
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
The fast path through a write will require replacing a single node in
the tree. Using a sheaf (32 nodes) is too heavy for the fast path, so
special case the node store operation by just allocating one node in the
maple state.
Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Use prefilled sheaves instead of bulk allocations. This should speed up
the allocations and the return path of unused allocations.
Remove the push and pop of nodes from the maple state as this is now
handled by the slab layer with sheaves.
Testing has been removed as necessary since the features of the tree
have been reduced.
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Add functions for efficient guaranteed allocations e.g. in a critical
section that cannot sleep, when the exact number of allocations is not
known beforehand, but an upper limit can be calculated.
kmem_cache_prefill_sheaf() returns a sheaf containing at least given
number of objects.
kmem_cache_alloc_from_sheaf() will allocate an object from the sheaf
and is guaranteed not to fail until depleted.
kmem_cache_return_sheaf() is for giving the sheaf back to the slab
allocator after the critical section. This will also attempt to refill
it to cache's sheaf capacity for better efficiency of sheaves handling,
but it's not stricly necessary to succeed.
kmem_cache_refill_sheaf() can be used to refill a previously obtained
sheaf to requested size. If the current size is sufficient, it does
nothing. If the requested size exceeds cache's sheaf_capacity and the
sheaf's current capacity, the sheaf will be replaced with a new one,
hence the indirect pointer parameter.
kmem_cache_sheaf_size() can be used to query the current size.
The implementation supports requesting sizes that exceed cache's
sheaf_capacity, but it is not efficient - such "oversize" sheaves are
allocated fresh in kmem_cache_prefill_sheaf() and flushed and freed
immediately by kmem_cache_return_sheaf(). kmem_cache_refill_sheaf()
might be especially ineffective when replacing a sheaf with a new one of
a larger capacity. It is therefore better to size cache's
sheaf_capacity accordingly to make oversize sheaves exceptional.
CONFIG_SLUB_STATS counters are added for sheaf prefill and return
operations. A prefill or return is considered _fast when it is able to
grab or return a percpu spare sheaf (even if the sheaf needs a refill to
satisfy the request, as those should amortize over time), and _slow
otherwise (when the barn or even sheaf allocation/freeing has to be
involved). sheaf_prefill_oversize is provided to determine how many
prefills were oversize (counter for oversize returns is not necessary as
all oversize refills result in oversize returns).
When slub_debug is enabled for a cache with sheaves, no percpu sheaves
exist for it, but the prefill functionality is still provided simply by
all prefilled sheaves becoming oversize. If percpu sheaves are not
created for a cache due to not passing the sheaf_capacity argument on
cache creation, the prefills also work through oversize sheaves, but
there's a WARN_ON_ONCE() to indicate the omission.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
Add str_assert_deassert() helper to return "assert" or "deassert"
string literal depending on the boolean argument. Also add the
inversed variant str_deassert_assert().
Suggested-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://lore.kernel.org/r/20250923095229.2149740-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Kees Cook <kees@kernel.org>
|