summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2024-12-23net/mlx5: fs, add counter object to flow destinationMoshe Shemesh
Currently mlx5_flow_destination includes counter_id which is assigned in case we use flow counter on the flow steering rule. However, counter_id is not enough data in case of using HW Steering. Thus, have mlx5_fc object as part of mlx5_flow_destination instead of counter_id and assign it where needed. In case counter_id is received from user space, create a local counter object to represent it. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20241219175841.1094544-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-23misc: trivial: Remove undesired double space from struct definitionLee Jones
When one is too lazy to use an LSP to conduct look-ups on struct definitions, one might use the ever useful `struct <name> {` search string. However this doesn't work with `struct miscdevice {` because of a stray double space. Assuming that this wasn't intentional, let's simply remove it. Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20241223151843.472645-1-lee@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23block: make bio_integrity_map_user() static inlineJens Axboe
If CONFIG_BLK_DEV_INTEGRITY isn't set, then the dummy helper must be static inline to avoid complaints about the function being unused. Fixes: fe8f4ca7107e ("block: modify bio_integrity_map_user to accept iov_iter as argument") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411300229.y7h60mDg-lkp@intel.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: remove BLK_MQ_F_SHOULD_MERGEChristoph Hellwig
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely process passthrough commands (bsg-lib, ufs tmf, various nvme admin queues) and thus don't even check the flag. Remove it to simplify the driver interface. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-mq: remove unused queue mapping helpersDaniel Wagner
There are no users left of the pci and virtio queue mapping helpers. Thus remove them. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-8-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23blk-mq: introduce blk_mq_map_hw_queuesDaniel Wagner
blk_mq_pci_map_queues and blk_mq_virtio_map_queues will create a CPU to hardware queue mapping based on affinity information. These two function share common code and only differ on how the affinity information is retrieved. Also, those functions are located in the block subsystem where it doesn't really fit in. They are virtio and pci subsystem specific. Thus introduce provide a generic mapping function which uses the irq_get_affinity callback from bus_type. Originally idea from Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-4-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23driver core: bus: add irq_get_affinity callback to bus_typeDaniel Wagner
Introducing a callback in struct bus_type so that a subsystem can hook up the getters directly. This approach avoids exposing random getters in any subsystems APIs. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Daniel Wagner <wagi@kernel.org> Link: https://lore.kernel.org/r/20241202-refactor-blk-affinity-helpers-v6-1-27211e9c2cd5@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: Delete bio_set_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_set_prio() does nothing but set bio->bi_ioprio. All other places just set bio->bi_ioprio directly, so replace bio_set_prio() remaining callsites with setting bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: Delete bio_prio()John Garry
Since commit 43b62ce3ff0a ("block: move bio io prio to a new field"), macro bio_prio() does nothing but return the value in bio->bi_ioprio. Most other places just read bio->bi_ioprio directly, so replace bi_ioprio() callsites with reading bio->bi_ioprio directly and delete that macro. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241202111957.2311683-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track queue dying state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save queue lying state when we want to lock the freeze queue since the state is one per-task variable now. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: track disk DEAD state automatically for modeling queue freeze lockdepMing Lei
Now we only verify the outmost freeze & unfreeze in current context in case that !q->mq_freeze_depth, so it is reliable to save disk DEAD state when we want to lock the freeze queue since the state is one per-task variable now. Doing this way can kill lots of false positive when freeze queue is called before adding disk[1]. [1] https://lore.kernel.org/linux-block/6741f6b2.050a0220.1cc393.0017.GAE@google.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241127135133.3952153-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: add support to pass user meta bufferKanchan Joshi
If an iocb contains metadata, extract that and prepare the bip. Based on flags specified by the user, set corresponding guard/app/ref tags to be checked in bip. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241128112240.8867-11-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23scsi: add support for user-meta interfaceAnuj Gupta
Add support for sending user-meta buffer. Set tags to be checked using flags specified by user/block-layer. With this change, BIP_CTRL_NOCHECK becomes unused. Remove it. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241128112240.8867-10-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flagsAnuj Gupta
This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which indicate how the hardware should check the integrity payload. BIP_CHECK_GUARD/REFTAG are conversion of existing semantics, while BIP_CHECK_APPTAG is a new flag. The driver can now just rely on block layer flags, and doesn't need to know the integrity source. Submitter of PI decides which tags to check. This would also give us a unified interface for user and kernel generated integrity. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241128112240.8867-8-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23fs: introduce IOCB_HAS_METADATA for metadataAnuj Gupta
Introduce an IOCB_HAS_METADATA flag for the kiocb struct, for handling requests containing meta payload. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241128112240.8867-6-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23fs, iov_iter: define meta io descriptorAnuj Gupta
Add flags to describe checks for integrity meta buffer. Also, introduce a new 'uio_meta' structure that upper layer can use to pass the meta/integrity information. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241128112240.8867-5-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: modify bio_integrity_map_user to accept iov_iter as argumentAnuj Gupta
This patch refactors bio_integrity_map_user to accept iov_iter as argument. This is a prep patch. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241128112240.8867-4-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23block: define set of integrity flags to be inherited by cloned bipAnuj Gupta
Introduce BIP_CLONE_FLAGS describing integrity flags that should be inherited in the cloned bip from the parent. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20241128112240.8867-2-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring/kbuf: use mmap_lock to sync with mmapPavel Begunkov
A preparation / cleanup patch simplifying the buf ring - mmap synchronisation. Instead of relying on RCU, which is trickier, do it by grabbing the mmap_lock when when anyone tries to publish or remove a registered buffer to / from ->io_bl_xa. Modifications of the xarray should always be protected by both ->uring_lock and ->mmap_lock, while lookups should hold either of them. While a struct io_buffer_list is in the xarray, the mmap related fields like ->flags and ->buf_pages should stay stable. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/af13bde56ee1a26bcaefaa9aad37a9ea318a590e.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring: use region api for CQPavel Begunkov
Convert internal parts of the CQ/SQ array managment to the region API. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/46fc3c801290d6b1ac16023d78f6b8e685c87fd6.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring: use region api for SQPavel Begunkov
Convert internal parts of the SQ managment to the region API. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1fb73ced6b835cb319ab0fe1dc0b2e982a9a5650.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring/memmap: flag vmap'ed regionsPavel Begunkov
Add internal flags for struct io_mapped_region. The first flag we need is IO_REGION_F_VMAPPED, that indicates that the pointer has to be unmapped on region destruction. For now all regions are vmap'ed, so it's set unconditionally. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5a3d8046a038da97c0f8a8c8f1733fa3fc689d31.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23io_uring: rename ->resize_lockPavel Begunkov
->resize_lock is used for resizing rings, but it's a good idea to reuse it in other cases as well. Rename it into mmap_lock as it's protects from races with mmap. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/68f705306f3ac4d2fb999eb80ea1615015ce9f7f.1732886067.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23KVM: Add member to struct kvm_gfn_range to indicate private/sharedIsaku Yamahata
Add new members to strut kvm_gfn_range to indicate which mapping (private-vs-shared) to operate on: enum kvm_gfn_range_filter attr_filter. Update the core zapping operations to set them appropriately. TDX utilizes two GPA aliases for the same memslots, one for memory that is for private memory and one that is for shared. For private memory, KVM cannot always perform the same operations it does on memory for default VMs, such as zapping pages and having them be faulted back in, as this requires guest coordination. However, some operations such as guest driven conversion of memory between private and shared should zap private memory. Internally to the MMU, private and shared mappings are tracked on separate roots. Mapping and zapping operations will operate on the respective GFN alias for each root (private or shared). So zapping operations will by default zap both aliases. Add fields in struct kvm_gfn_range to allow callers to specify which aliases so they can only target the aliases appropriate for their specific operation. There was feedback that target aliases should be specified such that the default value (0) is to operate on both aliases. Several options were considered. Several variations of having separate bools defined such that the default behavior was to process both aliases. They either allowed nonsensical configurations, or were confusing for the caller. A simple enum was also explored and was close, but was hard to process in the caller. Instead, use an enum with the default value (0) reserved as a disallowed value. Catch ranges that didn't have the target aliases specified by looking for that specific value. Set target alias with enum appropriately for these MMU operations: - For KVM's mmu notifier callbacks, zap shared pages only because private pages won't have a userspace mapping - For setting memory attributes, kvm_arch_pre_set_memory_attributes() chooses the aliases based on the attribute. - For guest_memfd invalidations, zap private only. Link: https://lore.kernel.org/kvm/ZivIF9vjKcuGie3s@google.com/ Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-ID: <20240718211230.1492011-3-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-23KVM: guest_memfd: Remove RCU-protected attribute from slot->gmem.fileYan Zhao
Remove the RCU-protected attribute from slot->gmem.file. No need to use RCU primitives rcu_assign_pointer()/synchronize_rcu() to update this pointer. - slot->gmem.file is updated in 3 places: kvm_gmem_bind(), kvm_gmem_unbind(), kvm_gmem_release(). All of them are protected by kvm->slots_lock. - slot->gmem.file is read in 2 paths: (1) kvm_gmem_populate kvm_gmem_get_file __kvm_gmem_get_pfn (2) kvm_gmem_get_pfn kvm_gmem_get_file __kvm_gmem_get_pfn Path (1) kvm_gmem_populate() requires holding kvm->slots_lock, so slot->gmem.file is protected by the kvm->slots_lock in this path. Path (2) kvm_gmem_get_pfn() does not require holding kvm->slots_lock. However, it's also not guarded by rcu_read_lock() and rcu_read_unlock(). So synchronize_rcu() in kvm_gmem_unbind()/kvm_gmem_release() actually will not wait for the readers in kvm_gmem_get_pfn() due to lack of RCU read-side critical section. The path (2) kvm_gmem_get_pfn() is safe without RCU protection because: a) kvm_gmem_bind() is called on a new memslot, before the memslot is visible to kvm_gmem_get_pfn(). b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot occur after the memslot is no longer visible to kvm_gmem_get_pfn(). c) get_file_active() ensures that kvm_gmem_get_pfn() will not access the stale file if kvm_gmem_release() sets it to NULL. This is because if kvm_gmem_release() occurs before kvm_gmem_get_pfn(), get_file_active() will return NULL; if get_file_active() does not return NULL, kvm_gmem_release() should not occur until after kvm_gmem_get_pfn() releases the file reference. Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20241104084303.29909-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-23opp: core: implement dev_pm_opp_get_bwNeil Armstrong
Add and implement dev_pm_opp_get_bw() to retrieve the OPP's bandwidth in the same way as the dev_pm_opp_get_voltage() helper. Retrieving bandwidth is required in the case of the Adreno GPU where the GPU Management Unit can handle the Bandwidth scaling. The helper can get the peak or average bandwidth for any of the interconnect path. Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> [ Viresh: Fixed commit log and a comment in code ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-12-23preempt: Move PREEMPT_RT before PREEMPT in vermagic.Sebastian Andrzej Siewior
Since the dynamic preemption has been enabled for PREEMPT_RT we have now CONFIG_PREEMPT and CONFIG_PREEMPT_RT set simultaneously. This affects the vermagic strings which comes now PREEMPT with PREEMPT_RT enabled. The PREEMPT_RT module usually can not be loaded on a PREEMPT kernel because some symbols are missing. However if the symbols are fine then it continues and it crashes later. The problem is that the struct module has a different layout and the num_exentries or init members are at a different position leading to a crash later on. This is not necessary caught by the size check in elf_validity_cache_index_mod() because the mem member has an alignment requirement of __module_memory_align which is big enough keep the total size unchanged. Therefore we should keep the string accurate instead of removing it. Move the PREEMPT_RT check before the PREEMPT so that it takes precedence if both symbols are enabled. Fixes: 35772d627b55c ("sched: Enable PREEMPT_DYNAMIC for PREEMPT_RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Link: https://lore.kernel.org/r/20241205160602.3lIAsJRT@linutronix.de Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
2024-12-23Merge 6.14-rc4 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-23SoundWire: pass stream to compute_params()Bard Liao
The stream parameter will be used in the follow up commit. No function change. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://lore.kernel.org/r/20241218080155.102405-14-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-12-23Soundwire: stream: program BUSCLOCK_SCALEBard Liao
We need to program bus clock scale to adjust the bus clock if current bus clock doesn't fit the bandwidth. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20241218080155.102405-8-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-12-23Soundwire: add sdw_slave_get_scale_index helperBard Liao
Currently, we only set peripheral frequency when the peripheral is initialized. However, curr_dr_freq may change to get required bandwidth. For example, curr_dr_freq may increase from 4.8MHz to 9.6MHz when the 4th stream is opened. Add a helper to get the scale index so that we can get the scale index and program it. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://lore.kernel.org/r/20241218080155.102405-7-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-12-23soundwire: add lane_used_bandwidth in struct sdw_busBard Liao
To support multi-lane, we need to know how much bandwidth is used on each lane. And to use the lane that has enough bandwidth. Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20241218080155.102405-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-12-23soundwire: mipi_disco: read lane mapping properties from ACPIBard Liao
The DisCo for SoundWire 2.0 added support for the 'mipi-sdw-lane-<n>-mapping' property. Co-developed-by: Chao Song <chao.song@linux.intel.com> Signed-off-by: Chao Song <chao.song@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20241218080155.102405-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-12-22fs: use a consume fence in mnt_idmap()Mateusz Guzik
The routine is used in link_path_walk() for every path component. To my reading the entire point of the fence was to grab a fully populated mnt_idmap, but that's already going to happen with mere consume fence. Eliminates an actual fence on arm64. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20241130051712.1036527-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22vfs: support caching symlink lengths in inodesMateusz Guzik
When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% speed up when issuing readlink on /initrd.img on ext4. Filesystems opt in by calling inode_set_cached_link() when creating an inode. The size is stored in a new union utilizing the same space as i_devices, thus avoiding growing the struct or taking up any more space. Churn-wise the current readlink_copy() helper is patched to accept the size instead of calculating it. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20241120112037.822078-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22fiemap: use kernel-doc includes in fiemap docbookRandy Dunlap
Add some kernel-doc notation to structs in fiemap header files then pull that into Documentation/filesystems/fiemap.rst instead of duplicating the header file structs in fiemap.rst. This helps to future-proof fiemap.rst against struct changes. Add missing flags documentation from header files into fiemap.rst for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22seqlock: annotate spinning as unlikely() in __read_seqcount_beginMateusz Guzik
Annotation already used to be there, but got lost in 52ac39e5db5148f7 ("seqlock: seqcount_t: Implement all read APIs as statement expressions"). Does not look like it was intentional. Without it gcc 12 decides to compile the following in path_init: nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount); nd->r_seq = __read_seqcount_begin(&rename_lock.seqcount); into 2 cases of conditional jumps forward if the value is even, aka branch prediction miss by default in the common case on x86-64. With the patch jumps are only for odd values. before: [snip] mov 0x104fe96(%rip),%eax # 0xffffffff82409680 <mount_lock> test $0x1,%al je 0xffffffff813b97fa <path_init+122> pause mov 0x104fe8a(%rip),%eax # 0xffffffff82409680 <mount_lock> test $0x1,%al jne 0xffffffff813b97ee <path_init+110> mov %eax,0x48(%rbx) mov 0x104fdfd(%rip),%eax # 0xffffffff82409600 <rename_lock> test $0x1,%al je 0xffffffff813b9813 <path_init+147> pause mov 0x104fdf1(%rip),%eax # 0xffffffff82409600 <rename_lock> test $0x1,%al jne 0xffffffff813b9807 <path_init+135> [/snip] after: [snip] mov 0x104fec6(%rip),%eax # 0xffffffff82409680 <mount_lock> test $0x1,%al jne 0xffffffff813b99af <path_init+607> mov %eax,0x48(%rbx) mov 0x104fe35(%rip),%eax # 0xffffffff82409600 <rename_lock> test $0x1,%al jne 0xffffffff813b999d <path_init+589> [/snip] Interestingly .text gets slightly smaller (as reported by size(1)): before: 20702563 after: 20702429 Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20230727180355.813995-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22pidfs: allow bind-mountsChristian Brauner
Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts for pidfds. This allows pidfds to be safely recovered and checked for process recycling. Link: https://lore.kernel.org/r/20241219-work-pidfs-mount-v1-1-dbc56198b839@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-22video: hdmi: Remove unused hdmi_infoframe_checkDr. David Alan Gilbert
hdmi_infoframe_check() has been unused since it was added in commit c5e69ab35c0d ("video/hdmi: Constify infoframe passed to the pack functions") Remove it. Note that the individual check functions for each type are actually used, so they're staying. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Helge Deller <deller@gmx.de>
2024-12-21Merge tag 'mm-hotfixes-stable-2024-12-21-12-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "25 hotfixes. 16 are cc:stable. 19 are MM and 6 are non-MM. The usual bunch of singletons and doubletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2024-12-21-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (25 commits) mm: huge_memory: handle strsep not finding delimiter alloc_tag: fix set_codetag_empty() when !CONFIG_MEM_ALLOC_PROFILING_DEBUG alloc_tag: fix module allocation tags populated area calculation mm/codetag: clear tags before swap mm/vmstat: fix a W=1 clang compiler warning mm: convert partially_mapped set/clear operations to be atomic nilfs2: fix buffer head leaks in calls to truncate_inode_pages() vmalloc: fix accounting with i915 mm/page_alloc: don't call pfn_to_page() on possibly non-existent PFN in split_large_buddy() fork: avoid inappropriate uprobe access to invalid mm nilfs2: prevent use of deleted inode zram: fix uninitialized ZRAM not releasing backing device zram: refuse to use zero sized block device as backing device mm: use clear_user_(high)page() for arch with special user folio handling mm: introduce cpu_icache_is_aliasing() across all architectures mm: add RCU annotation to pte_offset_map(_lock) mm: correctly reference merged VMA mm: use aligned address in copy_user_gigantic_page() mm: use aligned address in clear_gigantic_page() mm: shmem: fix ShmemHugePages at swapout ...
2024-12-21Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull BPF fixes from Daniel Borkmann: - Fix inlining of bpf_get_smp_processor_id helper for !CONFIG_SMP systems (Andrea Righi) - Fix BPF USDT selftests helper code to use asm constraint "m" for LoongArch (Tiezhu Yang) - Fix BPF selftest compilation error in get_uprobe_offset when PROCMAP_QUERY is not defined (Jerome Marchand) - Fix BPF bpf_skb_change_tail helper when used in context of BPF sockmap to handle negative skb header offsets (Cong Wang) - Several fixes to BPF sockmap code, among others, in the area of socket buffer accounting (Levi Zim, Zijian Zhang, Cong Wang) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Test bpf_skb_change_tail() in TC ingress selftests/bpf: Introduce socket_helpers.h for TC tests selftests/bpf: Add a BPF selftest for bpf_skb_change_tail() bpf: Check negative offsets in __bpf_skb_min_len() tcp_bpf: Fix copied value in tcp_bpf_sendmsg skmsg: Return copied bytes in sk_msg_memcopy_from_iter tcp_bpf: Add sk_rmem_alloc related logic for tcp_bpf ingress redirection tcp_bpf: Charge receive socket buffer in bpf_tcp_ingress() selftests/bpf: Fix compilation error in get_uprobe_offset() selftests/bpf: Use asm constraint "m" for LoongArch bpf: Fix bpf_get_smp_processor_id() on !CONFIG_SMP
2024-12-21lib min_heap: Switch to size_tKent Overstreet
size_t is the correct type for a count of objects that can fit in memory: this also means heaps now have the same memory layout as darrays (fs/bcachefs/darray.h), and darrays can be used as heaps. Cc: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Coly Li <colyli@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-12-21bcachefs: add support for true/false & yes/no in bool-type optionsIntegral
Here is the patch which uses existing constant table: Currently, when using bcachefs-tools to set options, bool-type options can only accept 1 or 0. Add support for accepting true/false and yes/no for these options. Signed-off-by: Integral <integral@murena.io> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Acked-by: David Howells <dhowells@redhat.com>
2024-12-20netfs: Change the read result collector to only use one work itemDavid Howells
Change the way netfslib collects read results to do all the collection for a particular read request using a single work item that walks along the subrequest queue as subrequests make progress or complete, unlocking folios progressively rather than doing the unlock in parallel as parallel requests come in. The code is remodelled to be more like the write-side code, though only using a single stream. This makes it more directly comparable and thus easier to duplicate fixes between the two sides. This has a number of advantages: (1) It's simpler. There doesn't need to be a complex donation mechanism to handle mismatches between the size and alignment of subrequests and folios. The collector unlocks folios as the subrequests covering each complete. (2) It should cause less scheduler overhead as there's a single work item in play unlocking pages in parallel when a read gets split up into a lot of subrequests instead of one per subrequest. Whilst the parallellism is nice in theory, in practice, the vast majority of loads are sequential reads of the whole file, so committing a bunch of threads to unlocking folios out of order doesn't help in those cases. (3) It should make it easier to implement content decryption. A folio cannot be decrypted until all the requests that contribute to it have completed - and, again, most loads are sequential and so, most of the time, we want to begin decryption sequentially (though it's great if the decryption can happen in parallel). There is a disadvantage in that we're losing the ability to decrypt and unlock things on an as-things-arrive basis which may affect some applications. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-28-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Add support for caching single monolithic objects such as AFS dirsDavid Howells
Add support for caching the content of a file that contains a single monolithic object that must be read/written with a single I/O operation, such as an AFS directory. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-20-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Add functions to build/clean a buffer in a folio_queueDavid Howells
Add two netfslib functions to build up or clean up a buffer in a folio_queue. The first, netfs_alloc_folioq_buffer() will add folios to a buffer, extending up at least to the given size. If it can, it will add multipage folios. The folios are optionally have the mapping set and will have the index set according to the distance from the front of the folio queue. The second function will free up a folio queue and put any folios in the queue that have the first mark set. The netfs_folio tracepoint is also altered to cope with folios that have a NULL mapping, and the folios being added/put will have trace lines emitted and will be accounted in the stats. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-19-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: Marc Dionne <marc.dionne@auristor.com> cc: netfs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Drop the was_async arg from netfs_read_subreq_terminated()David Howells
Drop the was_async argument from netfs_read_subreq_terminated(). Almost every caller is either in process context and passes false. Some filesystems delegate the call to a workqueue to avoid doing the work in their network message queue parsing thread. The only exception is netfs_cache_read_terminated() which handles completion in the cache - which is usually a callback from the backing filesystem in softirq context, though it can be from process context if an error occurred. In this case, delegate to a workqueue. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wiVC5Cgyz6QKXFu6fTaA6h4CjexDR-OV9kL6Vo5x9v8=A@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-10-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Drop the error arg from netfs_read_subreq_terminated()David Howells
Drop the error argument from netfs_read_subreq_terminated() in favour of passing the value in subreq->error. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-9-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Abstract out a rolling folio buffer implementationDavid Howells
A rolling buffer is a series of folios held in a list of folio_queues. New folios and folio_queue structs may be inserted at the head simultaneously with spent ones being removed from the tail without the need for locking. The rolling buffer includes an iov_iter and it has to be careful managing this as the list of folio_queues is extended such that an oops doesn't incurred because the iterator was pointing to the end of a folio_queue segment that got appended to and then removed. We need to use the mechanism twice, once for read and once for write, and, in future patches, we will use a second rolling buffer to handle bounce buffering for content encryption. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-6-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20netfs: Add a tracepoint to log the lifespan of folio_queue structsDavid Howells
Add a tracepoint to log the lifespan of folio_queue structs. For tracing illustrative purposes, folio_queues are tagged with the debug ID of whatever they're related to (typically a netfs_io_request) and a debug ID of their own. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20241216204124.3752367-5-dhowells@redhat.com cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>