Age | Commit message (Collapse) | Author |
|
Currently mlx5_flow_destination includes counter_id which is assigned in
case we use flow counter on the flow steering rule. However, counter_id
is not enough data in case of using HW Steering. Thus, have mlx5_fc
object as part of mlx5_flow_destination instead of counter_id and assign
it where needed.
In case counter_id is received from user space, create a local counter
object to represent it.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20241219175841.1094544-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When one is too lazy to use an LSP to conduct look-ups on struct
definitions, one might use the ever useful `struct <name> {` search
string. However this doesn't work with `struct miscdevice {` because
of a stray double space. Assuming that this wasn't intentional, let's
simply remove it.
Signed-off-by: Lee Jones <lee@kernel.org>
Link: https://lore.kernel.org/r/20241223151843.472645-1-lee@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
If CONFIG_BLK_DEV_INTEGRITY isn't set, then the dummy helper must be
static inline to avoid complaints about the function being unused.
Fixes: fe8f4ca7107e ("block: modify bio_integrity_map_user to accept iov_iter as argument")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202411300229.y7h60mDg-lkp@intel.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
process passthrough commands (bsg-lib, ufs tmf, various nvme admin
queues) and thus don't even check the flag. Remove it to simplify the
driver interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are no users left of the pci and virtio queue mapping helpers.
Thus remove them.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-8-27211e9c2cd5@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to
hardware queue mapping based on affinity information. These two function
share common code and only differ on how the affinity information is
retrieved. Also, those functions are located in the block subsystem
where it doesn't really fit in. They are virtio and pci subsystem
specific.
Thus introduce provide a generic mapping function which uses the
irq_get_affinity callback from bus_type.
Originally idea from Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-4-27211e9c2cd5@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introducing a callback in struct bus_type so that a subsystem
can hook up the getters directly. This approach avoids exposing
random getters in any subsystems APIs.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-1-27211e9c2cd5@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro
bio_set_prio() does nothing but set bio->bi_ioprio. All other places just
set bio->bi_ioprio directly, so replace bio_set_prio() remaining
callsites with setting bio->bi_ioprio directly and delete that macro.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro
bio_prio() does nothing but return the value in bio->bi_ioprio. Most other
places just read bio->bi_ioprio directly, so replace bi_ioprio() callsites
with reading bio->bi_ioprio directly and delete that macro.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20241202111957.2311683-2-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now we only verify the outmost freeze & unfreeze in current context in case
that !q->mq_freeze_depth, so it is reliable to save queue lying state when
we want to lock the freeze queue since the state is one per-task variable
now.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241127135133.3952153-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now we only verify the outmost freeze & unfreeze in current context in case
that !q->mq_freeze_depth, so it is reliable to save disk DEAD state when
we want to lock the freeze queue since the state is one per-task variable
now.
Doing this way can kill lots of false positive when freeze queue is
called before adding disk[1].
[1] https://lore.kernel.org/linux-block/6741f6b2.050a0220.1cc393.0017.GAE@google.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20241127135133.3952153-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If an iocb contains metadata, extract that and prepare the bip.
Based on flags specified by the user, set corresponding guard/app/ref
tags to be checked in bip.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-11-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add support for sending user-meta buffer. Set tags to be checked
using flags specified by user/block-layer.
With this change, BIP_CTRL_NOCHECK becomes unused. Remove it.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-10-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which
indicate how the hardware should check the integrity payload.
BIP_CHECK_GUARD/REFTAG are conversion of existing semantics, while
BIP_CHECK_APPTAG is a new flag. The driver can now just rely on block
layer flags, and doesn't need to know the integrity source. Submitter
of PI decides which tags to check. This would also give us a unified
interface for user and kernel generated integrity.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-8-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce an IOCB_HAS_METADATA flag for the kiocb struct, for handling
requests containing meta payload.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-6-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add flags to describe checks for integrity meta buffer. Also, introduce
a new 'uio_meta' structure that upper layer can use to pass the
meta/integrity information.
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241128112240.8867-5-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This patch refactors bio_integrity_map_user to accept iov_iter as
argument. This is a prep patch.
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-4-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Introduce BIP_CLONE_FLAGS describing integrity flags that should be
inherited in the cloned bip from the parent.
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20241128112240.8867-2-anuj20.g@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
A preparation / cleanup patch simplifying the buf ring - mmap
synchronisation. Instead of relying on RCU, which is trickier, do it by
grabbing the mmap_lock when when anyone tries to publish or remove a
registered buffer to / from ->io_bl_xa.
Modifications of the xarray should always be protected by both
->uring_lock and ->mmap_lock, while lookups should hold either of them.
While a struct io_buffer_list is in the xarray, the mmap related fields
like ->flags and ->buf_pages should stay stable.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Convert internal parts of the CQ/SQ array managment to the region API.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Convert internal parts of the SQ managment to the region API.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add internal flags for struct io_mapped_region. The first flag we need
is IO_REGION_F_VMAPPED, that indicates that the pointer has to be
unmapped on region destruction. For now all regions are vmap'ed, so it's
set unconditionally.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5a3d8046a038da97c0f8a8c8f1733fa3fc689d31.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
->resize_lock is used for resizing rings, but it's a good idea to reuse
it in other cases as well. Rename it into mmap_lock as it's protects
from races with mmap.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add new members to strut kvm_gfn_range to indicate which mapping
(private-vs-shared) to operate on: enum kvm_gfn_range_filter
attr_filter. Update the core zapping operations to set them appropriately.
TDX utilizes two GPA aliases for the same memslots, one for memory that is
for private memory and one that is for shared. For private memory, KVM
cannot always perform the same operations it does on memory for default
VMs, such as zapping pages and having them be faulted back in, as this
requires guest coordination. However, some operations such as guest driven
conversion of memory between private and shared should zap private memory.
Internally to the MMU, private and shared mappings are tracked on separate
roots. Mapping and zapping operations will operate on the respective GFN
alias for each root (private or shared). So zapping operations will by
default zap both aliases. Add fields in struct kvm_gfn_range to allow
callers to specify which aliases so they can only target the aliases
appropriate for their specific operation.
There was feedback that target aliases should be specified such that the
default value (0) is to operate on both aliases. Several options were
considered. Several variations of having separate bools defined such
that the default behavior was to process both aliases. They either allowed
nonsensical configurations, or were confusing for the caller. A simple
enum was also explored and was close, but was hard to process in the
caller. Instead, use an enum with the default value (0) reserved as a
disallowed value. Catch ranges that didn't have the target aliases
specified by looking for that specific value.
Set target alias with enum appropriately for these MMU operations:
- For KVM's mmu notifier callbacks, zap shared pages only because private
pages won't have a userspace mapping
- For setting memory attributes, kvm_arch_pre_set_memory_attributes()
chooses the aliases based on the attribute.
- For guest_memfd invalidations, zap private only.
Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-ID: <20240718211230.1492011-3-rick.p.edgecombe@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Remove the RCU-protected attribute from slot->gmem.file. No need to use RCU
primitives rcu_assign_pointer()/synchronize_rcu() to update this pointer.
- slot->gmem.file is updated in 3 places:
kvm_gmem_bind(), kvm_gmem_unbind(), kvm_gmem_release().
All of them are protected by kvm->slots_lock.
- slot->gmem.file is read in 2 paths:
(1) kvm_gmem_populate
kvm_gmem_get_file
__kvm_gmem_get_pfn
(2) kvm_gmem_get_pfn
kvm_gmem_get_file
__kvm_gmem_get_pfn
Path (1) kvm_gmem_populate() requires holding kvm->slots_lock, so
slot->gmem.file is protected by the kvm->slots_lock in this path.
Path (2) kvm_gmem_get_pfn() does not require holding kvm->slots_lock.
However, it's also not guarded by rcu_read_lock() and rcu_read_unlock().
So synchronize_rcu() in kvm_gmem_unbind()/kvm_gmem_release() actually
will not wait for the readers in kvm_gmem_get_pfn() due to lack of RCU
read-side critical section.
The path (2) kvm_gmem_get_pfn() is safe without RCU protection because:
a) kvm_gmem_bind() is called on a new memslot, before the memslot is
visible to kvm_gmem_get_pfn().
b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
occur after the memslot is no longer visible to kvm_gmem_get_pfn().
c) get_file_active() ensures that kvm_gmem_get_pfn() will not access the
stale file if kvm_gmem_release() sets it to NULL. This is because if
kvm_gmem_release() occurs before kvm_gmem_get_pfn(), get_file_active()
will return NULL; if get_file_active() does not return NULL,
kvm_gmem_release() should not occur until after kvm_gmem_get_pfn()
releases the file reference.
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241104084303.29909-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add and implement dev_pm_opp_get_bw() to retrieve the OPP's
bandwidth in the same way as the dev_pm_opp_get_voltage() helper.
Retrieving bandwidth is required in the case of the Adreno GPU
where the GPU Management Unit can handle the Bandwidth scaling.
The helper can get the peak or average bandwidth for any of
the interconnect path.
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
[ Viresh: Fixed commit log and a comment in code ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
Since the dynamic preemption has been enabled for PREEMPT_RT we have now
CONFIG_PREEMPT and CONFIG_PREEMPT_RT set simultaneously. This affects
the vermagic strings which comes now PREEMPT with PREEMPT_RT enabled.
The PREEMPT_RT module usually can not be loaded on a PREEMPT kernel
because some symbols are missing.
However if the symbols are fine then it continues and it crashes later.
The problem is that the struct module has a different layout and the
num_exentries or init members are at a different position leading to a
crash later on. This is not necessary caught by the size check in
elf_validity_cache_index_mod() because the mem member has an alignment
requirement of __module_memory_align which is big enough keep the total
size unchanged. Therefore we should keep the string accurate instead of
removing it.
Move the PREEMPT_RT check before the PREEMPT so that it takes precedence
if both symbols are enabled.
Fixes: 35772d627b55c ("sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Link: https://lore.kernel.org/r/20241205160602.3lIAsJRT@linutronix.de
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
|
|
We need the USB fixes in here as well for testing.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The stream parameter will be used in the follow up commit.
No function change.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://lore.kernel.org/r/20241218080155.102405-14-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
We need to program bus clock scale to adjust the bus clock if current
bus clock doesn't fit the bandwidth.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/20241218080155.102405-8-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
Currently, we only set peripheral frequency when the peripheral is
initialized. However, curr_dr_freq may change to get required bandwidth.
For example, curr_dr_freq may increase from 4.8MHz to 9.6MHz when the
4th stream is opened. Add a helper to get the scale index so that we can
get the scale index and program it.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://lore.kernel.org/r/20241218080155.102405-7-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
To support multi-lane, we need to know how much bandwidth
is used on each lane. And to use the lane that has enough
bandwidth.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20241218080155.102405-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
The DisCo for SoundWire 2.0 added support for the
'mipi-sdw-lane-<n>-mapping' property.
Co-developed-by: Chao Song <chao.song@linux.intel.com>
Signed-off-by: Chao Song <chao.song@linux.intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://lore.kernel.org/r/20241218080155.102405-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
The routine is used in link_path_walk() for every path component.
To my reading the entire point of the fence was to grab a fully
populated mnt_idmap, but that's already going to happen with mere
consume fence.
Eliminates an actual fence on arm64.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20241130051712.1036527-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When utilized it dodges strlen() in vfs_readlink(), giving about 1.5%
speed up when issuing readlink on /initrd.img on ext4.
Filesystems opt in by calling inode_set_cached_link() when creating an
inode.
The size is stored in a new union utilizing the same space as i_devices,
thus avoiding growing the struct or taking up any more space.
Churn-wise the current readlink_copy() helper is patched to accept the
size instead of calculating it.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20241120112037.822078-2-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add some kernel-doc notation to structs in fiemap header files
then pull that into Documentation/filesystems/fiemap.rst
instead of duplicating the header file structs in fiemap.rst.
This helps to future-proof fiemap.rst against struct changes.
Add missing flags documentation from header files into fiemap.rst
for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Annotation already used to be there, but got lost in 52ac39e5db5148f7
("seqlock: seqcount_t: Implement all read APIs as statement expressions").
Does not look like it was intentional.
Without it gcc 12 decides to compile the following in path_init:
nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount);
nd->r_seq = __read_seqcount_begin(&rename_lock.seqcount);
into 2 cases of conditional jumps forward if the value is even, aka
branch prediction miss by default in the common case on x86-64.
With the patch jumps are only for odd values.
before:
[snip]
mov 0x104fe96(%rip),%eax # 0xffffffff82409680 <mount_lock>
test $0x1,%al
je 0xffffffff813b97fa <path_init+122>
pause
mov 0x104fe8a(%rip),%eax # 0xffffffff82409680 <mount_lock>
test $0x1,%al
jne 0xffffffff813b97ee <path_init+110>
mov %eax,0x48(%rbx)
mov 0x104fdfd(%rip),%eax # 0xffffffff82409600 <rename_lock>
test $0x1,%al
je 0xffffffff813b9813 <path_init+147>
pause
mov 0x104fdf1(%rip),%eax # 0xffffffff82409600 <rename_lock>
test $0x1,%al
jne 0xffffffff813b9807 <path_init+135>
[/snip]
after:
[snip]
mov 0x104fec6(%rip),%eax # 0xffffffff82409680 <mount_lock>
test $0x1,%al
jne 0xffffffff813b99af <path_init+607>
mov %eax,0x48(%rbx)
mov 0x104fe35(%rip),%eax # 0xffffffff82409600 <rename_lock>
test $0x1,%al
jne 0xffffffff813b999d <path_init+589>
[/snip]
Interestingly .text gets slightly smaller (as reported by size(1)):
before: 20702563
after: 20702429
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/r/20230727180355.813995-1-mjguzik@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts for
pidfds. This allows pidfds to be safely recovered and checked for
process recycling.
Link: https://lore.kernel.org/r/20241219-work-pidfs-mount-v1-1-dbc56198b839@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
hdmi_infoframe_check() has been unused since it was added in
commit c5e69ab35c0d ("video/hdmi: Constify infoframe passed to the pack
functions")
Remove it.
Note that the individual check functions for each type are
actually used, so they're staying.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"25 hotfixes. 16 are cc:stable. 19 are MM and 6 are non-MM.
The usual bunch of singletons and doubletons - please see the relevant
changelogs for details"
* tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits)
mm: huge_memory: handle strsep not finding delimiter
alloc_tag: fix set_codetag_empty() when !CONFIG_MEM_ALLOC_PROFILING_DEBUG
alloc_tag: fix module allocation tags populated area calculation
mm/codetag: clear tags before swap
mm/vmstat: fix a W=1 clang compiler warning
mm: convert partially_mapped set/clear operations to be atomic
nilfs2: fix buffer head leaks in calls to truncate_inode_pages()
vmalloc: fix accounting with i915
mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy()
fork: avoid inappropriate uprobe access to invalid mm
nilfs2: prevent use of deleted inode
zram: fix uninitialized ZRAM not releasing backing device
zram: refuse to use zero sized block device as backing device
mm: use clear_user_(high)page() for arch with special user folio handling
mm: introduce cpu_icache_is_aliasing() across all architectures
mm: add RCU annotation to pte_offset_map(_lock)
mm: correctly reference merged VMA
mm: use aligned address in copy_user_gigantic_page()
mm: use aligned address in clear_gigantic_page()
mm: shmem: fix ShmemHugePages at swapout
...
|
|
Pull BPF fixes from Daniel Borkmann:
- Fix inlining of bpf_get_smp_processor_id helper for !CONFIG_SMP
systems (Andrea Righi)
- Fix BPF USDT selftests helper code to use asm constraint "m" for
LoongArch (Tiezhu Yang)
- Fix BPF selftest compilation error in get_uprobe_offset when
PROCMAP_QUERY is not defined (Jerome Marchand)
- Fix BPF bpf_skb_change_tail helper when used in context of BPF
sockmap to handle negative skb header offsets (Cong Wang)
- Several fixes to BPF sockmap code, among others, in the area of
socket buffer accounting (Levi Zim, Zijian Zhang, Cong Wang)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test bpf_skb_change_tail() in TC ingress
selftests/bpf: Introduce socket_helpers.h for TC tests
selftests/bpf: Add a BPF selftest for bpf_skb_change_tail()
bpf: Check negative offsets in __bpf_skb_min_len()
tcp_bpf: Fix copied value in tcp_bpf_sendmsg
skmsg: Return copied bytes in sk_msg_memcopy_from_iter
tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection
tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress()
selftests/bpf: Fix compilation error in get_uprobe_offset()
selftests/bpf: Use asm constraint "m" for LoongArch
bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
|
|
size_t is the correct type for a count of objects that can fit in
memory: this also means heaps now have the same memory layout as darrays
(fs/bcachefs/darray.h), and darrays can be used as heaps.
Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Coly Li <colyli@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Here is the patch which uses existing constant table:
Currently, when using bcachefs-tools to set options, bool-type options
can only accept 1 or 0. Add support for accepting true/false and yes/no
for these options.
Signed-off-by: Integral <integral@murena.io>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Acked-by: David Howells <dhowells@redhat.com>
|
|
Change the way netfslib collects read results to do all the collection for
a particular read request using a single work item that walks along the
subrequest queue as subrequests make progress or complete, unlocking folios
progressively rather than doing the unlock in parallel as parallel requests
come in.
The code is remodelled to be more like the write-side code, though only
using a single stream. This makes it more directly comparable and thus
easier to duplicate fixes between the two sides.
This has a number of advantages:
(1) It's simpler. There doesn't need to be a complex donation mechanism
to handle mismatches between the size and alignment of subrequests and
folios. The collector unlocks folios as the subrequests covering each
complete.
(2) It should cause less scheduler overhead as there's a single work item
in play unlocking pages in parallel when a read gets split up into a
lot of subrequests instead of one per subrequest.
Whilst the parallellism is nice in theory, in practice, the vast
majority of loads are sequential reads of the whole file, so
committing a bunch of threads to unlocking folios out of order doesn't
help in those cases.
(3) It should make it easier to implement content decryption. A folio
cannot be decrypted until all the requests that contribute to it have
completed - and, again, most loads are sequential and so, most of the
time, we want to begin decryption sequentially (though it's great if
the decryption can happen in parallel).
There is a disadvantage in that we're losing the ability to decrypt and
unlock things on an as-things-arrive basis which may affect some
applications.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add support for caching the content of a file that contains a single
monolithic object that must be read/written with a single I/O operation,
such as an AFS directory.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-20-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add two netfslib functions to build up or clean up a buffer in a
folio_queue. The first, netfs_alloc_folioq_buffer() will add folios to a
buffer, extending up at least to the given size. If it can, it will add
multipage folios. The folios are optionally have the mapping set and will
have the index set according to the distance from the front of the folio
queue.
The second function will free up a folio queue and put any folios in the
queue that have the first mark set.
The netfs_folio tracepoint is also altered to cope with folios that have a
NULL mapping, and the folios being added/put will have trace lines emitted
and will be accounted in the stats.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-19-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: netfs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the was_async argument from netfs_read_subreq_terminated(). Almost
every caller is either in process context and passes false. Some
filesystems delegate the call to a workqueue to avoid doing the work in
their network message queue parsing thread.
The only exception is netfs_cache_read_terminated() which handles
completion in the cache - which is usually a callback from the backing
filesystem in softirq context, though it can be from process context if an
error occurred. In this case, delegate to a workqueue.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/CAHk-=wiVC5Cgyz6QKXFu6fTaA6h4CjexDR-OV9kL6Vo5x9v8=A@mail.gmail.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-10-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Drop the error argument from netfs_read_subreq_terminated() in favour of
passing the value in subreq->error.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-9-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
A rolling buffer is a series of folios held in a list of folio_queues. New
folios and folio_queue structs may be inserted at the head simultaneously
with spent ones being removed from the tail without the need for locking.
The rolling buffer includes an iov_iter and it has to be careful managing
this as the list of folio_queues is extended such that an oops doesn't
incurred because the iterator was pointing to the end of a folio_queue
segment that got appended to and then removed.
We need to use the mechanism twice, once for read and once for write, and,
in future patches, we will use a second rolling buffer to handle bounce
buffering for content encryption.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-6-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a tracepoint to log the lifespan of folio_queue structs. For tracing
illustrative purposes, folio_queues are tagged with the debug ID of
whatever they're related to (typically a netfs_io_request) and a debug ID
of their own.
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20241216204124.3752367-5-dhowells@redhat.com
cc: Jeff Layton <jlayton@kernel.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|