summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2025-01-20net: move HDS config from ethtool stateJakub Kicinski
Separate the HDS config from the ethtool state struct. The HDS config contains just simple parameters, not state. Having it as a separate struct will make it easier to clone / copy and also long term potentially make it per-queue. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20Merge tag 'vfs-6.14-rc1.statx.dio' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs direct-io updates from Christian Brauner: "File systems that write out of place usually require different alignment for direct I/O writes than what they can do for reads. Add a separate dio read align field to statx, as many out of place write file systems can easily do reads aligned to the device sector size, but require bigger alignment for writes. This is usually papered over by falling back to buffered I/O for smaller writes and doing read-modify-write cycles, but performance for this sucks, so applications benefit from knowing the actual write alignment" * tag 'vfs-6.14-rc1.statx.dio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: report larger dio alignment for COW inodes xfs: report the correct read/write dio alignment for reflinked inodes xfs: cleanup xfs_vn_getattr fs: add STATX_DIO_READ_ALIGN fs: reformat the statx definition
2025-01-20Merge tag 'vfs-6.14-rc1.libfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs libfs updates from Christian Brauner: "This improves the stable directory offset behavior in various ways. Stable offsets are needed so that NFS can reliably read directories on filesystems such as tmpfs: - Improve the end-of-directory detection According to getdents(3), the d_off field in each returned directory entry points to the next entry in the directory. The d_off field in the last returned entry in the readdir buffer must contain a valid offset value, but if it points to an actual directory entry, then readdir/getdents can loop. Introduce a specific fixed offset value that is placed in the d_off field of the last entry in a directory. Some user space applications assume that the EOD offset value is larger than the offsets of real directory entries, so the largest valid offset value is reserved for this purpose. This new value is never allocated by simple_offset_add(). When ->iterate_dir() returns, getdents{64} inserts the ctx->pos value into the d_off field of the last valid entry in the readdir buffer. When it hits EOD, offset_readdir() sets ctx->pos to the EOD offset value so the last entry is updated to point to the EOD marker. When trying to read the entry at the EOD offset, offset_readdir() terminates immediately. - Rely on d_children to iterate stable offset directories Instead of using the mtree to emit entries in the order of their offset values, use it only to map incoming ctx->pos to a starting entry. Then use the directory's d_children list, which is already maintained properly by the dcache, to find the next child to emit. - Narrow the range of directory offset values returned by simple_offset_add() to 3 .. (S32_MAX - 1) on all platforms. This means the allocation behavior is identical on 32-bit systems, 64-bit systems, and 32-bit user space on 64-bit kernels. The new range still permits over 2 billion concurrent entries per directory. - Return ENOSPC when the directory offset range is exhausted. Hitting this error is almost impossible though. - Remove the simple_offset_empty() helper" * tag 'vfs-6.14-rc1.libfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: libfs: Use d_children list to iterate simple_offset directories libfs: Replace simple_offset end-of-directory detection Revert "libfs: fix infinite directory reads for offset dir" Revert "libfs: Add simple_offset_empty()" libfs: Return ENOSPC when the directory offset range is exhausted
2025-01-20Merge tag 'vfs-6.14-rc1.mount.v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add a mountinfo program to demonstrate statmount()/listmount() Add a new "mountinfo" sample userland program that demonstrates how to use statmount() and listmount() to get at the same info that /proc/pid/mountinfo provides - Remove pointless nospec.h include - Prepend statmount.mnt_opts string with security_sb_mnt_opts() Currently these mount options aren't accessible via statmount() - Add new mount namespaces to mount namespace rbtree outside of the namespace semaphore - Lockless mount namespace lookup Currently we take the read lock when looking for a mount namespace to list mounts in. We can make this lockless. The simple search case can just use a sequence counter to detect concurrent changes to the rbtree For walking the list of mount namespaces sequentially via nsfs we keep a separate rcu list as rb_prev() and rb_next() aren't usable safely with rcu. Currently there is no primitive for retrieving the previous list member. To do this we need a new deletion primitive that doesn't poison the prev pointer and a corresponding retrieval helper Since creating mount namespaces is a relatively rare event compared with querying mounts in a foreign mount namespace this is worth it. Once libmount and systemd pick up this mechanism to list mounts in foreign mount namespaces this will be used very frequently - Add extended selftests for lockless mount namespace iteration - Add a sample program to list all mounts on the system, i.e., in all mount namespaces - Improve mount namespace iteration performance Make finding the last or first mount to start iterating the mount namespace from an O(1) operation and add selftests for iterating the mount table starting from the first and last mount - Use an xarray for the old mount id While the ida does use the xarray internally we can use it explicitly which allows us to increment the unique mount id under the xa lock. This allows us to remove the atomic as we're now allocating both ids in one go - Use a shared header for vfs sample programs - Fix build warnings for new sample program to list all mounts * tag 'vfs-6.14-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: samples/vfs: fix build warnings samples/vfs: use shared header samples/vfs/mountinfo: Use __u64 instead of uint64_t fs: remove useless lockdep assertion fs: use xarray for old mount id selftests: add listmount() iteration tests fs: cache first and last mount samples: add test-list-all-mounts selftests: remove unneeded include selftests: add tests for mntns iteration seltests: move nsfs into filesystems subfolder fs: simplify rwlock to spinlock fs: lockless mntns lookup for nsfs rculist: add list_bidir_{del,prev}_rcu() fs: lockless mntns rbtree lookup fs: add mount namespace to rbtree late fs: prepend statmount.mnt_opts string with security_sb_mnt_opts() mount: remove inlude/nospec.h include samples: add a mountinfo program to demonstrate statmount()/listmount()
2025-01-20Merge tag 'kernel-6.14-rc1.pid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_max namespacing update from Christian Brauner: "The pid_max sysctl is a global value. For a long time the default value has been 65535 and during the pidfd dicussions Linus proposed to bump pid_max by default. Based on this discussion systemd started bumping pid_max to 2^22. So all new systems now run with a very high pid_max limit with some distros having also backported that change. The decision to bump pid_max is obviously correct. It just doesn't make a lot of sense nowadays to enforce such a low pid number. There's sufficient tooling to make selecting specific processes without typing really large pid numbers available. In any case, there are workloads that have expections about how large pid numbers they accept. Either for historical reasons or architectural reasons. One concreate example is the 32-bit version of Android's bionic libc which requires pid numbers less than 65536. There are workloads where it is run in a 32-bit container on a 64-bit kernel. If the host has a pid_max value greater than 65535 the libc will abort thread creation because of size assumptions of pthread_mutex_t. That's a fairly specific use-case however, in general specific workloads that are moved into containers running on a host with a new kernel and a new systemd can run into issues with large pid_max values. Obviously making assumptions about the size of the allocated pid is suboptimal but we have userspace that does it. Of course, giving containers the ability to restrict the number of processes in their respective pid namespace indepent of the global limit through pid_max is something desirable in itself and comes in handy in general. Independent of motivating use-cases the existence of pid namespaces makes this also a good semantical extension and there have been prior proposals pushing in a similar direction. The trick here is to minimize the risk of regressions which I think is doable. The fact that pid namespaces are hierarchical will help us here. What we mostly care about is that when the host sets a low pid_max limit, say (crazy number) 100 that no descendant pid namespace can allocate a higher pid number in its namespace. Since pid allocation is hierarchial this can be ensured by checking each pid allocation against the pid namespace's pid_max limit. This means if the allocation in the descendant pid namespace succeeds, the ancestor pid namespace can reject it. If the ancestor pid namespace has a higher limit than the descendant pid namespace the descendant pid namespace will reject the pid allocation. The ancestor pid namespace will obviously not care about this. All in all this means pid_max continues to enforce a system wide limit on the number of processes but allows pid namespaces sufficient leeway in handling workloads with assumptions about pid values and allows containers to restrict the number of processes in a pid namespace through the pid_max interface" * tag 'kernel-6.14-rc1.pid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tests/pid_namespace: add pid_max tests pid: allow pid_max to be set per pid namespace
2025-01-20Merge branches 'pm-sleep', 'pm-cpuidle' and 'pm-em'Rafael J. Wysocki
Merge updates related to system sleep, a cpuidle update and an Energy Model handling code update for 6.14-rc1: - Allow configuring the system suspend-resume (DPM) watchdog to warn earlier than panic (Douglas Anderson). - Implement devm_device_init_wakeup() helper and introduce a device- managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan). - Remove direct inclusions of 'pm_wakeup.h' which should be only included via 'device.h' (Wolfram Sang). - Clean up two comments in the core system-wide PM code (Rafael Wysocki, Randy Dunlap). - Add Clearwater Forest processor support to the intel_idle cpuidle driver (Artem Bityutskiy). - Move sched domains rebuild function from the schedutil cpufreq governor to the Energy Model handling code (Rafael Wysocki). * pm-sleep: PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq() PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic PM: sleep: convert comment from kernel-doc to plain comment PM: wakeup: implement devm_device_init_wakeup() helper PM: sleep: sysfs: don't include 'pm_wakeup.h' directly PM: sleep: autosleep: don't include 'pm_wakeup.h' directly PM: sleep: Update stale comment in device_resume() * pm-cpuidle: intel_idle: add Clearwater Forest SoC support * pm-em: PM: EM: Move sched domains rebuild function from schedutil to EM
2025-01-20Merge tag 'kernel-6.14-rc1.cred' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred refcount updates from Christian Brauner: "For the v6.13 cycle we switched overlayfs to a variant of override_creds() that doesn't take an extra reference. To this end the {override,revert}_creds_light() helpers were introduced. This generalizes the idea behind {override,revert}_creds_light() to the {override,revert}_creds() helpers. Afterwards overriding and reverting credentials is reference count free unless the caller explicitly takes a reference. All callers have been appropriately ported" * tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) cred: fold get_new_cred_many() into get_cred_many() cred: remove unused get_new_cred() nfsd: avoid pointless cred reference count bump cachefiles: avoid pointless cred reference count bump dns_resolver: avoid pointless cred reference count bump trace: avoid pointless cred reference count bump cgroup: avoid pointless cred reference count bump acct: avoid pointless reference count bump io_uring: avoid pointless cred reference count bump smb: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump ovl: avoid pointless cred reference count bump open: avoid pointless cred reference count bump nfsfh: avoid pointless cred reference count bump nfs/nfs4recover: avoid pointless cred reference count bump nfs/nfs4idmap: avoid pointless reference count bump nfs/localio: avoid pointless cred reference count bumps coredump: avoid pointless cred reference count bump binfmt_misc: avoid pointless cred reference count bump ...
2025-01-20Merge tag 'vfs-6.14-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - Rework inode number allocation Recently we received a patchset that aims to enable file handle encoding and decoding via name_to_handle_at(2) and open_by_handle_at(2). A crucical step in the patch series is how to go from inode number to struct pid without leaking information into unprivileged contexts. The issue is that in order to find a struct pid the pid number in the initial pid namespace must be encoded into the file handle via name_to_handle_at(2). This can be used by containers using a separate pid namespace to learn what the pid number of a given process in the initial pid namespace is. While this is a weak information leak it could be used in various exploits and in general is an ugly wart in the design. To solve this problem a new way is needed to lookup a struct pid based on the inode number allocated for that struct pid. The other part is to remove the custom inode number allocation on 32bit systems that is also an ugly wart that should go away. Allocate unique identifiers for struct pid by simply incrementing a 64 bit counter and insert each struct pid into the rbtree so it can be looked up to decode file handles avoiding to leak actual pids across pid namespaces in file handles. On both 64 bit and 32 bit the same 64 bit identifier is used to lookup struct pid in the rbtree. On 64 bit the unique identifier for struct pid simply becomes the inode number. Comparing two pidfds continues to be as simple as comparing inode numbers. On 32 bit the 64 bit number assigned to struct pid is split into two 32 bit numbers. The lower 32 bits are used as the inode number and the upper 32 bits are used as the inode generation number. Whenever a wraparound happens on 32 bit the 64 bit number will be incremented by 2 so inode numbering starts at 2 again. When a wraparound happens on 32 bit multiple pidfds with the same inode number are likely to exist. This isn't a problem since before pidfs pidfds used the anonymous inode meaning all pidfds had the same inode number. On 32 bit sserspace can thus reconstruct the 64 bit identifier by retrieving both the inode number and the inode generation number to compare, or use file handles. This gives the same guarantees on both 32 bit and 64 bit. - Implement file handle support This is based on custom export operation methods which allows pidfs to implement permission checking and opening of pidfs file handles cleanly without hacking around in the core file handle code too much. - Support bind-mounts Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts for pidfds. This allows pidfds to be safely recovered and checked for process recycling. Instead of checking d_ops for both nsfs and pidfs we could in a follow-up patch add a flag argument to struct dentry_operations that functions similar to file_operations->fop_flags. * tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: add pidfd bind-mount tests pidfs: allow bind-mounts pidfs: lookup pid through rbtree selftests/pidfd: add pidfs file handle selftests pidfs: check for valid ioctl commands pidfs: implement file handle support exportfs: add permission method fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh() exportfs: add open method fhandle: simplify error handling pseudofs: add support for export_ops pidfs: support FS_IOC_GETVERSION pidfs: remove 32bit inode number handling pidfs: rework inode number allocation
2025-01-20Merge tag 'vfs-6.14-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Support caching symlink lengths in inodes The size is stored in a new union utilizing the same space as i_devices, thus avoiding growing the struct or taking up any more space When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% speed up when issuing readlink on /initrd.img on ext4 - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP - Enable VBOXGUEST and VBOXSF_FS on ARM64 Now that VirtualBox is able to run as a host on arm64 (e.g. the Apple M3 processors) we can enable VBOXSF_FS (and in turn VBOXGUEST) for this architecture. Tested with various runs of bonnie++ and dbench on an Apple MacBook Pro with the latest Virtualbox 7.1.4 r165100 installed Cleanups: - Delay sysctl_nr_open check in expand_files() - Use kernel-doc includes in fiemap docbook - Use page->private instead of page->index in watch_queue - Use a consume fence in mnt_idmap() as it's heavily used in link_path_walk() - Replace magic number 7 with ARRAY_SIZE() in fc_log - Sort out a stale comment about races between fd alloc and dup2() - Fix return type of do_mount() from long to int - Various cosmetic cleanups for the lockref code Fixes: - Annotate spinning as unlikely() in __read_seqcount_begin The annotation already used to be there, but got lost in commit 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") - Fix proc_handler for sysctl_nr_open - Flush delayed work in delayed fput() - Fix grammar and spelling in propagate_umount() - Fix ESP not readable during coredump In /proc/PID/stat, there is the kstkesp field which is the stack pointer of a thread. While the thread is active, this field reads zero. But during a coredump, it should have a valid value However, at the moment, kstkesp is zero even during coredump - Don't wake up the writer if the pipe is still full - Fix unbalanced user_access_end() in select code" * tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) gfs2: use lockref_init for qd_lockref erofs: use lockref_init for pcl->lockref dcache: use lockref_init for d_lockref lockref: add a lockref_init helper lockref: drop superfluous externs lockref: use bool for false/true returns lockref: improve the lockref_get_not_zero description lockref: remove lockref_put_not_zero fs: Fix return type of do_mount() from long to int select: Fix unbalanced user_access_end() vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64 pipe_read: don't wake up the writer if the pipe is still full selftests: coredump: Add stackdump test fs/proc: do_task_stat: Fix ESP not readable during coredump fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag fs: sort out a stale comment about races between fd alloc and dup2 fs: Fix grammar and spelling in propagate_umount() fs: fc_log replace magic number 7 with ARRAY_SIZE() fs: use a consume fence in mnt_idmap() file: flush delayed work in delayed fput() ...
2025-01-20Merge tag 'vfs-6.14-rc1.netfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs netfs updates from Christian Brauner: "This contains read performance improvements and support for monolithic single-blob objects that have to be read/written as such (e.g. AFS directory contents). The implementation of the two parts is interwoven as each makes the other possible. - Read performance improvements The read performance improvements are intended to speed up some loss of performance detected in cifs and to a lesser extend in afs. The problem is that we queue too many work items during the collection of read results: each individual subrequest is collected by its own work item, and then they have to interact with each other when a series of subrequests don't exactly align with the pattern of folios that are being read by the overall request. Whilst the processing of the pages covered by individual subrequests as they complete potentially allows folios to be woken in parallel and with minimum delay, it can shuffle wakeups for sequential reads out of order - and that is the most common I/O pattern. The final assessment and cleanup of an operation is then held up until the last I/O completes - and for a synchronous sequential operation, this means the bouncing around of work items just adds latency. Two changes have been made to make this work: (1) All collection is now done in a single "work item" that works progressively through the subrequests as they complete (and also dispatches retries as necessary). (2) For readahead and AIO, this work item be done on a workqueue and can run in parallel with the ultimate consumer of the data; for synchronous direct or unbuffered reads, the collection is run in the application thread and not offloaded. Functions such as smb2_readv_callback() then just tell netfslib that the subrequest has terminated; netfslib does a minimal bit of processing on the spot - stat counting and tracing mostly - and then queues/wakes up the worker. This simplifies the logic as the collector just walks sequentially through the subrequests as they complete and walks through the folios, if buffered, unlocking them as it goes. It also keeps to a minimum the amount of latency injected into the filesystem's low-level I/O handling The way netfs supports filesystems using the deprecated PG_private_2 flag is changed: folios are flagged and added to a write request as they complete and that takes care of scheduling the writes to the cache. The originating read request can then just unlock the pages whatever happens. - Single-blob object support Single-blob objects are files for which the content of the file must be read from or written to the server in a single operation because reading them in parts may yield inconsistent results. AFS directories are an example of this as there exists the possibility that the contents are generated on the fly and would differ between reads or might change due to third party interference. Such objects will be written to and retrieved from the cache if one is present, though we allow/may need to propose multiple subrequests to do so. The important part is that read from/write to the *server* is monolithic. Single blob reading is, for the moment, fully synchronous and does result collection in the application thread and, also for the moment, the API is supplied the buffer in the form of a folio_queue chain rather than using the pagecache. - Related afs changes This series makes a number of changes to the kafs filesystem, primarily in the area of directory handling: - AFS's FetchData RPC reply processing is made partially asynchronous which allows the netfs_io_request's outstanding operation counter to be removed as part of reducing the collection to a single work item. - Directory and symlink reading are plumbed through netfslib using the single-blob object API and are now cacheable with fscache. This also allows the afs_read struct to be eliminated and netfs_io_subrequest to be used directly instead. - Directory and symlink content are now stored in a folio_queue buffer rather than in the pagecache. This means we don't require the RCU read lock and xarray iteration to access it, and folios won't randomly disappear under us because the VM wants them back. - The vnode operation lock is changed from a mutex struct to a private lock implementation. The problem is that the lock now needs to be dropped in a separate thread and mutexes don't permit that. - When a new directory or symlink is created, we now initialise it locally and mark it valid rather than downloading it (we know what it's likely to look like). - We now use the in-directory hashtable to reduce the number of entries we need to scan when doing a lookup. The edit routines have to maintain the hash chains. - Cancellation (e.g. by signal) of an async call after the rxrpc_call has been set up is now offloaded to the worker thread as there will be a notification from rxrpc upon completion. This avoids a double cleanup. - A "rolling buffer" implementation is created to abstract out the two separate folio_queue chaining implementations I had (one for read and one for write). - Functions are provided to create/extend a buffer in a folio_queue chain and tear it down again. This is used to handle AFS directories, but could also be used to create bounce buffers for content crypto and transport crypto. - The was_async argument is dropped from netfs_read_subreq_terminated() Instead we wake the read collection work item by either queuing it or waking up the app thread. - We don't need to use BH-excluding locks when communicating between the issuing thread and the collection thread as neither of them now run in BH context. - Also included are a number of new tracepoints; a split of the netfslib write collection code to put retrying into its own file (it gets more complicated with content encryption). - There are also some minor fixes AFS included, including fixing the AFS directory format struct layout, reducing some directory over-invalidation and making afs_mkdir() translate EEXIST to ENOTEMPY (which is not available on all systems the servers support). - Finally, there's a patch to try and detect entry into the folio unlock function with no folio_queue structs in the buffer (which isn't allowed in the cases that can get there). This is a debugging patch, but should be minimal overhead" * tag 'vfs-6.14-rc1.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits) netfs: Report on NULL folioq in netfs_writeback_unlock_folios() afs: Add a tracepoint for afs_read_receive() afs: Locally initialise the contents of a new symlink on creation afs: Use the contained hashtable to search a directory afs: Make afs_mkdir() locally initialise a new directory's content netfs: Change the read result collector to only use one work item afs: Make {Y,}FS.FetchData an asynchronous operation afs: Fix cleanup of immediately failed async calls afs: Eliminate afs_read afs: Use netfslib for symlinks, allowing them to be cached afs: Use netfslib for directories afs: Make afs_init_request() get a key if not given a file netfs: Add support for caching single monolithic objects such as AFS dirs netfs: Add functions to build/clean a buffer in a folio_queue afs: Add more tracepoints to do with tracking validity cachefiles: Add auxiliary data trace cachefiles: Add some subrequest tracepoints netfs: Remove some extraneous directory invalidations afs: Fix directory format encoding struct afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY ...
2025-01-20Merge branches 'acpi-battery', 'acpi-fan' and 'acpi-misc'Rafael J. Wysocki
Merge ACPI battery and fan drivers updates and miscellaneous ACPI chanages for 6.14: - Update messages printed by the ACPI battery driver to always refer to driver extensions as "hooks" to avoid confusion with similar functionality in the power supply subsystem in the future (Thomas Weißschuh). - Fix .probe() error path cleanup in the ACPI fan driver to avoid memory leaks (Joe Hattori). - Constify 'struct bin_attribute' in some places in the ACPI subsystem and mark it as __ro_after_init in one place to prevent binary blob attributes from being updated (Thomas Weißschuh) - Add empty stubs for several ACPI-related symbols so that they can be used when CONFIG_ACPI is unset and use them for removing unnecessary conditional compilation from the ipu-bridge driver (Ricardo Ribalda). * acpi-battery: ACPI: battery: Rename extensions to hook in messages * acpi-fan: ACPI: fan: cleanup resources in the error path of .probe() * acpi-misc: media: ipu-bridge: Remove unneeded conditional compilations ACPI: bus: implement acpi_device_hid when !ACPI ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI ACPI: header: implement acpi_device_handle when !ACPI ACPI: bus: implement acpi_get_physical_device_location when !ACPI ACPI: bus: implement for_each_acpi_dev_match when !ACPI ACPI: bus: change the prototype for acpi_get_physical_device_location ACPI: sysfs: Constify 'struct bin_attribute' ACPI: BGRT: Constify 'struct bin_attribute' ACPI: BGRT: Mark bin_attribute as __ro_after_init
2025-01-20Merge branch 'kvm-mirror-page-tables' into HEADPaolo Bonzini
As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. Confidential computing solutions almost invariably have concepts of private and shared memory, but they may different a lot in the details. In SEV, for example, the bit is handled more like a permission bit as far as the page tables are concerned: the private/shared bit is not included in the physical address. For TDX, instead, the bit is more like a physical address bit, with the host mapping private memory in one half of the address space and shared in another. Furthermore, the two halves are mapped by different EPT roots and only the shared half is managed by KVM; the private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module via SEAMCALLs. As a result, the operations that actually change the private half of the EPT are limited and relatively slow compared to reading a PTE. For this reason the design for KVM is to keep a mirror of the private EPT in host memory. This allows KVM to quickly walk the EPT and only perform the slower private EPT operations when it needs to actually modify mid-level private PTEs. There are thus three sets of EPT page tables: external, mirror and direct. In the case of TDX (the only user of this framework) the first two cover private memory, whereas the third manages shared memory: external EPT - Hidden within the TDX module, modified via TDX module calls. mirror EPT - Bookkeeping tree used as an optimization by KVM, not used by the processor. direct EPT - Normal EPT that maps unencrypted shared memory. Managed like the EPT of a normal VM. Modifying external EPT ---------------------- Modifications to the mirrored page tables need to also perform the same operations to the private page tables, which will be handled via kvm_x86_ops. Although this prep series does not interact with the TDX module at all to actually configure the private EPT, it does lay the ground work for doing this. In some ways updating the private EPT is as simple as plumbing PTE modifications through to also call into the TDX module; however, the locking is more complicated because inserting a single PTE cannot anymore be done atomically with a single CMPXCHG. For this reason, the existing FROZEN_SPTE mechanism is used whenever a call to the TDX module updates the private EPT. FROZEN_SPTE acts basically as a spinlock on a PTE. Besides protecting operation of KVM, it limits the set of cases in which the TDX module will encounter contention on its own PTE locks. Zapping external EPT -------------------- While the framework tries to be relatively generic, and to be understandable without knowing TDX much in detail, some requirements of TDX sometimes leak; for example the private page tables also cannot be zapped while the range has anything mapped, so the mirrored/private page tables need to be protected from KVM operations that zap any non-leaf PTEs, for example kvm_mmu_reset_context() or kvm_mmu_zap_all_fast(). For normal VMs, guest memory is zapped for several reasons: user memory getting paged out by the guest, memslots getting deleted, passthrough of devices with non-coherent DMA. Confidential computing adds to these the conversion of memory between shared and privates. These operations must not zap any private memory that is in use by the guest. This is possible because the only zapping that is out of the control of KVM/userspace is paging out userspace memory, which cannot apply to guestmemfd operations. Thus a TDX VM will only zap private memory from memslot deletion and from conversion between private and shared memory which is triggered by the guest. To avoid zapping too much memory, enums are introduced so that operations can choose to target only private or shared memory, and thus only direct or mirror EPT. For example: Memslot deletion - Private and shared MMU notifier based zapping - Shared only Conversion to shared - Private only Conversion to private - Shared only Other cases of zapping will not be supported for KVM, for example APICv update or non-coherent DMA status update; for the latter, TDX will simply require that the CPU supports self-snoop and honor guest PAT unconditionally for shared memory.
2025-01-20Merge tag 'opp-updates-6.14' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Merge OPP (Operating Performance Points) updates for 6.14 from Viresh Kumar: "- Minor cleanups / fixes (Dan Carpenter, Neil Armstrong, and Joe Hattori). - Implement dev_pm_opp_get_bw (Neil Armstrong). - Expose reference counting helpers (Viresh Kumar)." * tag 'opp-updates-6.14' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: PM / OPP: Add reference counting helpers for Rust implementation OPP: OF: Fix an OF node leak in _opp_add_static_v2() OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized OPP: add index check to assert to avoid buffer overflow in _read_freq() opp: core: Fix off by one in dev_pm_opp_get_bw() opp: core: implement dev_pm_opp_get_bw
2025-01-20Merge tag 'kvm-x86-vcpu_array-6.14' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM vcpu_array fixes and cleanups for 6.14: - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw. - Wait for a vCPU to come online prior to executing a vCPU ioctl to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined. - Gracefully handle xa_insert() failures even though such failuires should be impossible in practice.
2025-01-20Merge tag 'kvm-memslots-6.14' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM kvm_set_memory_region() cleanups and hardening for 6.14: - Add proper lockdep assertions when setting memory regions. - Add a dedicated API for setting KVM-internal memory regions. - Explicitly disallow all flags for KVM-internal memory regions.
2025-01-20Merge branch 'for-6.14/intel-thc' into for-linusJiri Kosina
- newly added support for Intel Touch Host Controller (Even Xu, Xinpeng Sun)
2025-01-20Merge branch 'for-6.14/intel-ish' into for-linusJiri Kosina
- dead code removal in intel-ish-hid driver (Dr. David Alan Gilbert)
2025-01-20Merge branch 'for-6.14/constify-bin-attribute' into for-linusJiri Kosina
- constification of 'struct bin_attribute' in various HID driver (Thomas Weißschuh)
2025-01-20PM / OPP: Add reference counting helpers for Rust implementationViresh Kumar
To ensure that resources such as OPP tables or OPP nodes are not freed while in use by the Rust implementation, it is necessary to increment their reference count from Rust code. This commit introduces a new helper function, dev_pm_opp_get_opp_table_ref(), to increment the reference count of an OPP table and declares the existing helper dev_pm_opp_get() in pm_opp.h. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2025-01-19cpumask: Rephrase comments for cpumask_any*() APIsI Hsin Cheng
The cpumask_any*() APIs comment states that it returns a "random" cpu within the given cpumask. However it's not actually random as random itself stands a meaning for uniform distribution. cpumask_any*() APIs are a naming convention for the caller to states that it doesn't care which CPU it gets, so change "random" to "arbitrary" would be more appropriate. CC: Mark Rutland <mark.rutland@arm.com> Signed-off-by: I Hsin Cheng <richard120310@gmail.com> Reviewed-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-01-19Merge tag 'timers_urgent_for_v6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Borislav Petkov: - Reset hrtimers correctly when a CPU hotplug state traversal happens "half-ways" and leaves hrtimers not (re-)initialized properly - Annotate accesses to a timer group's ignore flag to prevent KCSAN from raising data_race warnings - Make sure timer group initialization is visible to timer tree walkers and avoid a hypothetical race - Fix another race between CPU hotplug and idle entry/exit where timers on a fully idle system are getting ignored - Fix a case where an ignored signal is still being handled which it shouldn't be * tag 'timers_urgent_for_v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: hrtimers: Handle CPU state correctly on hotplug timers/migration: Annotate accesses to ignore flag timers/migration: Enforce group initialization visibility to tree walkers timers/migration: Fix another race between hotplug and idle entry/exit signal/posixtimers: Handle ignore/blocked sequences correctly
2025-01-19crypto: asymmetric_keys - Remove unused key_being_used_for[]Dr. David Alan Gilbert
key_being_used_for[] is an unused array of textual names for the elements of the enum key_being_used_for. It was added in 2015 by commit 99db44350672 ("PKCS#7: Appropriately restrict authenticated attributes and content type") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-01-18Merge tag 'wireless-next-2025-01-17' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.14 Most likely the last "new features" pull request for v6.14 and this is a bigger one. Multi-Link Operation (MLO) work continues both in stack in drivers. Few new devices supported and usual fixes all over. Major changes: cfg80211 * Emergency Preparedness Communication Services (EPCS) station mode support mac80211 * an option to filter a sta from being flushed * some support for RX Operating Mode Indication (OMI) power saving * support for adding and removing station links for MLO iwlwifi * new device ids * rework firmware error handling and restart rtw88 * RTL8812A: RFE type 2 support * LED support rtw89 * variant info to support RTL8922AE-VS mt76 * mt7996: single wiphy multiband support (preparation for MLO) * mt7996: support for more variants * mt792x: P2P_DEVICE support * mt7921u: TP-Link TXE50UH support ath12k * enable MLO for QCN9274 (although it seems to be broken with dual band devices) * MLO radar detection support * debugfs: transmit buffer OFDMA, AST entry and puncture stats * tag 'wireless-next-2025-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (322 commits) wifi: brcmfmac: fix NULL pointer dereference in brcmf_txfinalize() wifi: rtw88: add RTW88_LEDS depends on LEDS_CLASS to Kconfig wifi: wilc1000: unregister wiphy only after netdev registration wifi: cfg80211: adjust allocation of colocated AP data wifi: mac80211: fix memory leak in ieee80211_mgd_assoc_ml_reconf() wifi: ath12k: fix key cache handling wifi: ath12k: Fix uninitialized variable access in ath12k_mac_allocate() function wifi: ath12k: Remove ath12k_get_num_hw() helper function wifi: ath12k: Refactor the ath12k_hw get helper function argument wifi: ath12k: Refactor ath12k_hw set helper function argument wifi: mt76: mt7996: add implicit beamforming support for mt7992 wifi: mt76: mt7996: fix beacon command during disabling wifi: mt76: mt7996: fix ldpc setting wifi: mt76: mt7996: fix definition of tx descriptor wifi: mt76: connac: adjust phy capabilities based on band constraints wifi: mt76: mt7996: fix incorrect indexing of MIB FW event wifi: mt76: mt7996: fix HE Phy capability wifi: mt76: mt7996: fix the capability of reception of EHT MU PPDU wifi: mt76: mt7996: add max mpdu len capability wifi: mt76: mt7921: avoid undesired changes of the preset regulatory domain ... ==================== Link: https://patch.msgid.link/20250117203529.72D45C4CEDD@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18net: phy: remove leftovers from switch to linkmode bitmapsHeiner Kallweit
We have some leftovers from the switch to linkmode bitmaps which - have never been used - are not used any longer - have no user outside phy_device.c So remove them. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/5493b96e-88bb-4230-a911-322659ec5167@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18mailbox: add Samsung Exynos driverTudor Ambarus
The Samsung Exynos mailbox controller, used on Google GS101 SoC, has 16 flag bits for hardware interrupt generation and a shared register for passing mailbox messages. When the controller is used by the ACPM interface the shared register is ignored and the mailbox controller acts as a doorbell. The controller just raises the interrupt to APM after the ACPM interface has written the message to SRAM. Add support for the Samsung Exynos mailbox controller. Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-18mailbox: add Microchip IPC supportValentina Fernandez
Add a mailbox controller driver for the Microchip Inter-processor Communication (IPC), which is used to send and receive data between processors. The driver uses the RISC-V Supervisor Binary Interface (SBI) to communicate with software running in machine mode (M-mode) to access the IPC hardware block. Additional details on the Microchip vendor extension and the IPC function IDs described in the driver can be found in the following documentation: https://github.com/linux4microchip/microchip-sbi-ecall-extension This SBI interface in this driver is compatible with the Mi-V Inter-hart Communication (IHC) IP. Transmitting and receiving data through the mailbox framework is done through struct mchp_ipc_msg. Signed-off-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2025-01-18of: address: Add parent_bus_addr to struct of_pci_rangeFrank Li
Add a new field called 'parent_bus_addr' to struct of_pci_range to use when retrieving parent bus address information. Refer to the diagram below to better understand that the bus fabric in some systems (like i.MX8QXP) does not always use a 1:1 address map between input and output. Currently, many controller drivers use the cpu_addr_fixup() callback that would often hardcode address translation directly in the code, e.g., "cpu_addr & CDNS_PLAT_CPU_TO_BUS_ADDR" or "cpu_addr + BUS_IATU_OFFSET", etc., even though those translations *should* be described via DT. However, the cpu_addr_fixup() can be eliminated if DT correctly reflects hardware behavior and drivers use 'parent_bus_addr' in struct of_pci_range. ┌─────────┐ ┌────────────┐ ┌─────┐ │ │ IA: 0x8ff8_0000 │ │ │ CPU ├───►│ ┌────►├─────────────────┐ │ PCI │ └─────┘ │ │ │ IA: 0x8ff0_0000 │ │ │ CPU Addr │ │ ┌─►├─────────────┐ │ │ Controller │ 0x7ff8_0000─┼───┘ │ │ │ │ │ │ │ │ │ │ │ │ │ PCI Addr 0x7ff0_0000─┼──────┘ │ │ └──► IOSpace ─┼────────────► │ │ │ │ │ 0 0x7000_0000─┼────────►├─────────┐ │ │ │ └─────────┘ │ └──────► CfgSpace ─┼────────────► BUS Fabric │ │ │ 0 │ │ │ └──────────► MemSpace ─┼────────────► IA: 0x8000_0000 │ │ 0x8000_0000 └────────────┘ bus@5f000000 { compatible = "simple-bus"; #address-cells = <1>; #size-cells = <1>; ranges = <0x80000000 0x0 0x70000000 0x10000000>; pcie@5f010000 { compatible = "fsl,imx8q-pcie"; reg = <0x5f010000 0x10000>, <0x8ff00000 0x80000>; reg-names = "dbi", "config"; #address-cells = <3>; #size-cells = <2>; device_type = "pci"; bus-range = <0x00 0xff>; ranges = <0x81000000 0 0x00000000 0x8ff80000 0 0x00010000>, <0x82000000 0 0x80000000 0x80000000 0 0x0ff00000>; ... }; }; In the diagram above, the 'parent_bus_addr' field in struct of_pci_range can indicate internal address (IA) address information. Link: https://lore.kernel.org/r/20241119-pci_fixup_addr-v8-1-c4bfa5193288@nxp.com Signed-off-by: Frank Li <Frank.Li@nxp.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-18PCI: Remove devres from pci_intx()Philipp Stanner
pci_intx() is a hybrid function which can sometimes be managed through devres. This hybrid nature is undesirable. Since all users of pci_intx() have by now been ported either to always-managed pcim_intx() or never-managed pci_intx_unmanaged(), the devres functionality can be removed from pci_intx(). Consequently, pci_intx_unmanaged() is now redundant, because pci_intx() itself is now unmanaged. Remove the devres functionality from pci_intx(). Have all users of pci_intx_unmanaged() call pci_intx(). Remove pci_intx_unmanaged(). Link: https://lore.kernel.org/r/20241209130632.132074-13-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com>
2025-01-18PCI: Export pci_intx_unmanaged() and pcim_intx()Philipp Stanner
pci_intx() is a hybrid function which sometimes performs devres operations, depending on whether pcim_enable_device() has been used to enable the pci_dev. This sometimes-managed nature of the function is problematic. Notably, it causes the function to allocate under some circumstances which makes it unusable from interrupt context. Export pcim_intx() (which is always managed) and rename __pcim_intx() (which is never managed) to pci_intx_unmanaged() and export it as well. Then all callers of pci_intx() can be ported to the version they need, depending whether they use pci_enable_device() or pcim_enable_device(). Link: https://lore.kernel.org/r/20241209130632.132074-3-pstanner@redhat.com Signed-off-by: Philipp Stanner <pstanner@redhat.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
2025-01-18Merge patch series "riscv: Add support for xtheadvector"Palmer Dabbelt
Charlie Jenkins <charlie@rivosinc.com> says: xtheadvector is a custom extension that is based upon riscv vector version 0.7.1 [1]. All of the vector routines have been modified to support this alternative vector version based upon whether xtheadvector was determined to be supported at boot. vlenb is not supported on the existing xtheadvector hardware, so a devicetree property thead,vlenb is added to provide the vlenb to Linux. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is used to request which thead vendor extensions are supported on the current platform. This allows future vendors to allocate hwprobe keys for their vendor. Support for xtheadvector is also added to the vector kselftests. [1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d335e03d3134b14133f/xtheadvector.adoc * b4-shazam-merge: riscv: Add ghostwrite vulnerability selftests: riscv: Support xtheadvector in vector tests selftests: riscv: Fix vector tests riscv: hwprobe: Document thead vendor extensions and xtheadvector extension riscv: hwprobe: Add thead vendor extension probing riscv: vector: Support xtheadvector save/restore riscv: Add xtheadvector instruction definitions riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT RISC-V: define the elements of the VCSR vector CSR riscv: vector: Use vlenb from DT for thead riscv: Add thead and xtheadvector as a vendor extension riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree dt-bindings: cpus: add a thead vlen register length property dt-bindings: riscv: Add xtheadvector ISA extension description Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-0-236c22791ef9@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-18riscv: Add ghostwrite vulnerabilityCharlie Jenkins
Follow the patterns of the other architectures that use GENERIC_CPU_VULNERABILITIES for riscv to introduce the ghostwrite vulnerability and mitigation. The mitigation is to disable all vector which is accomplished by clearing the bit from the cpufeature field. Ghostwrite only affects thead c9xx CPUs that impelment xtheadvector, so the vulerability will only be mitigated on these CPUs. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Yangyu Chen <cyy@cyyself.name> Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-14-236c22791ef9@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-17net: ethtool: ts: add separate counter for unconfirmed one-step TX timestampsVladimir Oltean
For packets with two-step timestamp requests, the hardware timestamp comes back to the driver through a confirmation mechanism of sorts, which allows the driver to confidently bump the successful "pkts" counter. For one-step PTP, the NIC is supposed to autonomously insert its hardware TX timestamp in the packet headers while simultaneously transmitting it. There may be a confirmation that this was done successfully, or there may not. None of the current drivers which implement ethtool_ops :: get_ts_stats() also support HWTSTAMP_TX_ONESTEP_SYNC or HWTSTAMP_TX_ONESTEP_SYNC, so it is a bit unclear which model to follow. But there are NICs, such as DSA, where there is no transmit confirmation at all. Here, it would be wrong / misleading to increment the successful "pkts" counter, because one-step PTP packets can be dropped on TX just like any other packets. So introduce a special counter which signifies "yes, an attempt was made, but we don't know whether it also exited the port or not". I expect that for one-step PTP packets where a confirmation is available, the "pkts" counter would be bumped. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250116104628.123555-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: support FW Recovery Mode Konrad Knitter says: Enable update of card in FW Recovery Mode * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: support FW Recovery Mode devlink: add devl guard pldmfw: enable selected component update ==================== Link: https://patch.msgid.link/20250116212059.1254349-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17Merge tag 'soc-fixes-6.13-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Two last minute fixes: one build issue on TI soc drivers, and a regression in the renesas reset controller driver" * tag 'soc-fixes-6.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: soc: ti: pruss: Fix pruss APIs reset: rzg2l-usbphy-ctrl: Assign proper of node to the allocated device
2025-01-17dcache: back inline names with a struct-wrapped array of unsigned longAl Viro
... so that they can be copied with struct assignment (which generates better code) and accessed word-by-word. The type is union shortname_storage; it's a union of arrays of unsigned char and unsigned long. struct name_snapshot.inline_name turned into union shortname_storage; users (all in fs/dcache.c) adjusted. struct dentry.d_iname has some users outside of fs/dcache.c; to reduce the amount of noise in commit, it is replaced with union shortname_storage d_shortname and d_iname is turned into a macro that expands to d_shortname.string (similar to d_lock handling). That compat macro is temporary - most of the remaining instances will be taken out by debugfs series, and once that is merged and few others are taken care of this will go away. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-17make sure that DNAME_INLINE_LEN is a multiple of word sizeAl Viro
... calling the number of words DNAME_INLINE_WORDS. The next step will be to have a structure to hold inline name arrays (both in dentry and in name_snapshot) and use that to alias the existing arrays of unsigned char there. That will allow both full-structure copies and convenient word-by-word accesses. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-17dm-table: atomic writes supportJohn Garry
Support stacking atomic write limits for DM devices. All the pre-existing code in blk_stack_atomic_writes_limits() already takes care of finding the aggregrate limits from the bottom devices. Feature flag DM_TARGET_ATOMIC_WRITES is introduced so that atomic writes can be enabled on personalities selectively. This is to ensure that atomic writes are only enabled when verified to be working properly (for a specific personality). In addition, it just may not make sense to enable atomic writes on some personalities (so this flag also helps there). Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-01-17block: Add common atomic writes enable flagJohn Garry
Currently only stacked devices need to explicitly enable atomic writes by setting BLK_FEAT_ATOMIC_WRITES_STACKED flag. This does not work well for device mapper stacking devices, as there many sets of limits are stacked and what is the 'bottom' and 'top' device can swapped. This means that BLK_FEAT_ATOMIC_WRITES_STACKED needs to be set for many queue limits, which is messy. Generalize enabling atomic writes enabling by ensuring that all devices must explicitly set a flag - that includes NVMe, SCSI sd, and md raid. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250116170301.474130-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-17PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()Peng Fan
Add device-managed variant of dev_pm_set_wake_irq which automatically clear the wake irq on device destruction to simplify error handling and resource management in drivers. Signed-off-by: Peng Fan <peng.fan@nxp.com> Link: https://patch.msgid.link/20250103-wake_irq-v2-1-e3aeff5e9966@nxp.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-17regulator: Add support for power budgetKory Maincent
Introduce power budget management for the regulator device. Enable tracking of available power capacity by providing helpers to request and release power budget allocations. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20250115-feature_regulator_pw_budget-v2-1-0a44b949e6bc@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-17ACPI: platform_profile: Add documentationKurt Borja
Add kerneldoc and sysfs class documentation. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250116002721.75592-19-kuurtb@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-17ACPI: platform_profile: Move platform_profile_handlerKurt Borja
platform_profile_handler is now an internal structure. Move it to platform_profile.c. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250116002721.75592-17-kuurtb@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-17ACPI: platform_profile: Remove platform_profile_handler from exported symbolsKurt Borja
In order to protect the platform_profile_handler from API consumers, allocate it in platform_profile_register() and modify it's signature accordingly. Remove the platform_profile_handler from all consumer drivers and replace them with a pointer to the class device, which is now returned from platform_profile_register(). Replace *pprof with a pointer to the class device in the rest of exported symbols. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Kurt Borja <kuurtb@gmail.com> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Tested-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250116002721.75592-16-kuurtb@gmail.com Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-01-17of: Do not expose of_alias_scan() and correct its commentsZijun Hu
of_alias_scan() has no external callers and returns void. Do not expose it and delete return value descriptions in its comments. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20250114-of_core_fix-v5-1-b8bafd00a86f@quicinc.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-01-17serial: kgdb_nmi: Remove unused knock codeDr. David Alan Gilbert
kgdb_nmi_poll_knock() has been unused since it was added in 2013 in commit 0c57dfcc6c1d ("tty/serial: Add kgdb_nmi driver") Remove it, the static helpers, and module parameters it used. (The comment explaining why it might be used sounds sensible, but it's never been wired up, perhaps it's worth doing somewhere?) Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://lore.kernel.org/r/20250112135759.105541-1-linux@treblig.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17usb: typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR PPSKyle Tso
The Source can drop its output voltage to the minimum of the requested PPS APDO voltage range when it is in Current Limit Mode. If this voltage falls within the range of vPpsShutdown, the Source initiates a Hard Reset and discharges Vbus. However, currently the Sink may disconnect before the voltage reaches vPpsShutdown, leading to unexpected behavior. Prevent premature disconnection by setting the Sink's disconnect threshold to the minimum vPpsShutdown value. Additionally, consider the voltage drop due to IR drop when calculating the appropriate threshold. This ensures a robust and reliable interaction between the Source and Sink during SPR PPS Current Limit Mode operation. Fixes: 4288debeaa4e ("usb: typec: tcpci: Fix up sink disconnect thresholds for PD") Cc: stable <stable@kernel.org> Signed-off-by: Kyle Tso <kyletso@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Badhri Jagan Sridharan <badhri@google.com> Link: https://lore.kernel.org/r/20250114142435.2093857-1-kyletso@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-17Merge branches 'arm/smmu/updates', 'arm/smmu/bindings', 'qualcomm/msm', ↵Joerg Roedel
'rockchip', 'riscv', 'core', 'intel/vt-d' and 'amd/amd-vi' into next
2025-01-16net: phylink: add EEE managementRussell King (Oracle)
Add EEE management to phylink, making use of the phylib implementation. This will only be used where a MAC driver populates the methods and capabilities bitfield, otherwise we keep our old behaviour. Phylink will keep track of the EEE configuration, including the clock stop abilities at each end of the MAC to PHY link, programming the PHY appropriately and preserving the LPI configuration should the PHY go away. Phylink will call into the MAC driver when LPI needs to be enabled or disabled, with the requirement that the MAC have LPI disabled prior to the netdev being brought up (in other words, it will only call mac_disable_tx_lpi() if it has already called mac_enable_tx_lpi().) Support for phylink managed EEE is enabled by populating both tx_lpi MAC operations method pointers, and filling in both LPI interfaces and capabilities. If the methods are provided but the LPI interfaces or capabilities remain empty, this indicates to phylink that EEE is implemented by the driver but the hardware it is driving does not support EEE, and thus the ethtool set_eee() and get_eee() methods will return EOPNOTSUPP. No validation of the LPI timer value is performed by this patch. For interface modes which do not support LPI, we make no attempt to manipulate the phylib EEE advertisement, but instead refuse to activate LPI at the MAC, noting it at debug message level. We also restrict the advertisement and reported userspace support linkmode masks according to the lpi_capabilities provided to phylink by the MAC driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1tYADq-0014Pn-J1@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16net: phy: add support for querying PHY clock stop capabilityRussell King (Oracle)
Add support for querying whether the PHY allows the transmit xMII clock to be stopped while in LPI mode. This will be used by phylink to pass to the MAC driver so it can configure the generation of the xMII clock appropriately. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/E1tYADg-0014Pb-AJ@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16pldmfw: enable selected component updateKonrad Knitter
This patch enables to update a selected component from PLDM image containing multiple components. Example usage: struct pldmfw; data.mode = PLDMFW_UPDATE_MODE_SINGLE_COMPONENT; data.compontent_identifier = DRIVER_FW_MGMT_COMPONENT_ID; Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Konrad Knitter <konrad.knitter@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>