linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-01-20	Merge tag 'execve-v6.14-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case (Tycho Andersen, Kees Cook) - binfmt_misc: Fix comment typos (Christophe JAILLET) - move empty argv[0] warning closer to actual logic (Nir Lichtman) - remove legacy custom binfmt modules autoloading (Nir Lichtman) - Make sure set_task_comm() always NUL-terminates - binfmt_flat: Fix integer overflow bug on 32 bit systems (Dan Carpenter) - coredump: Do not lock when copying "comm" - MAINTAINERS: add auxvec.h and set myself as maintainer * tag 'execve-v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: binfmt_flat: Fix integer overflow bug on 32 bit systems selftests/exec: add a test for execveat()'s comm exec: fix up /proc/pid/comm in the execveat(AT_EMPTY_PATH) case exec: Make sure task->comm is always NUL-terminated exec: remove legacy custom binfmt modules autoloading exec: move warning of null argv to be next to the relevant code fs: binfmt: Fix a typo MAINTAINERS: exec: Mark Kees as maintainer MAINTAINERS: exec: Add auxvec.h UAPI coredump: Do not lock during 'comm' reporting
2025-01-20	Merge branch 'pm-tools'	Rafael J. Wysocki
	Merge cpupower utility updates for 6.14: - Fix TSC MHz calculation in cpupower (He Rongguang). - Add install and uninstall options to bindings Makefile and add header changes for cpufreq.h to SWIG bindings in cpupower (John B. Wyatt IV). - Add missing residency header changes in cpuidle.h to SWIG bindings in cpupower (John B. Wyatt IV). - Add output files to .gitignore and clean them up in "make clean" in selftests/cpufreq (Li Zhijian). - Fix cross-compilation in cpupower Makefile (Peng Fan). - Revise the is_valid flag handling for idle_monitor in the cpupower utility (wangfushuai). - Extend and clean up AMD processors support in cpupower (Mario Limonciello). * pm-tools: pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG pm: cpupower: Add header changes for cpufreq.h to SWIG bindings pm: cpupower: Add install and uninstall options to bindings makefile cpupower: Adjust whitespace for amd-pstate specific prints cpupower: Don't fetch maximum latency when EPP is enabled cpupower: Add support for showing energy performance preference cpupower: Don't try to read frequency from hardware when kernel uses aperfmperf cpupower: Add support for amd-pstate preferred core rankings cpupower: Add support for parsing 'enabled' or 'disabled' strings from table cpupower: Remove spurious return statement cpupower: fix TSC MHz calculation cpupower: revise is_valid flag handling for idle_monitor pm: cpupower: Makefile: Fix cross compilation selftests/cpufreq: gitignore output files and clean them in make clean
2025-01-20	selftests/net/ipsec: Fix Null pointer dereference in rtattr_pack()	Liu Ye
	Address Null pointer dereference / undefined behavior in rtattr_pack (note that size is 0 in the bad case). Flagged by cppcheck as: tools/testing/selftests/net/ipsec.c:230:25: warning: Possible null pointer dereference: payload [nullPointer] memcpy(RTA_DATA(attr), payload, size); ^ tools/testing/selftests/net/ipsec.c:1618:54: note: Calling function 'rtattr_pack', 4th argument 'NULL' value is 0 if (rtattr_pack(&req.nh, sizeof(req), XFRMA_IF_ID, NULL, 0)) { ^ tools/testing/selftests/net/ipsec.c:230:25: note: Null pointer dereference memcpy(RTA_DATA(attr), payload, size); ^ Signed-off-by: Liu Ye <liuye@kylinos.cn> Link: https://patch.msgid.link/20250116013037.29470-1-liuye@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20	Merge tag 'vfs-6.14-rc1.mount.v2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add a mountinfo program to demonstrate statmount()/listmount() Add a new "mountinfo" sample userland program that demonstrates how to use statmount() and listmount() to get at the same info that /proc/pid/mountinfo provides - Remove pointless nospec.h include - Prepend statmount.mnt_opts string with security_sb_mnt_opts() Currently these mount options aren't accessible via statmount() - Add new mount namespaces to mount namespace rbtree outside of the namespace semaphore - Lockless mount namespace lookup Currently we take the read lock when looking for a mount namespace to list mounts in. We can make this lockless. The simple search case can just use a sequence counter to detect concurrent changes to the rbtree For walking the list of mount namespaces sequentially via nsfs we keep a separate rcu list as rb_prev() and rb_next() aren't usable safely with rcu. Currently there is no primitive for retrieving the previous list member. To do this we need a new deletion primitive that doesn't poison the prev pointer and a corresponding retrieval helper Since creating mount namespaces is a relatively rare event compared with querying mounts in a foreign mount namespace this is worth it. Once libmount and systemd pick up this mechanism to list mounts in foreign mount namespaces this will be used very frequently - Add extended selftests for lockless mount namespace iteration - Add a sample program to list all mounts on the system, i.e., in all mount namespaces - Improve mount namespace iteration performance Make finding the last or first mount to start iterating the mount namespace from an O(1) operation and add selftests for iterating the mount table starting from the first and last mount - Use an xarray for the old mount id While the ida does use the xarray internally we can use it explicitly which allows us to increment the unique mount id under the xa lock. This allows us to remove the atomic as we're now allocating both ids in one go - Use a shared header for vfs sample programs - Fix build warnings for new sample program to list all mounts * tag 'vfs-6.14-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: samples/vfs: fix build warnings samples/vfs: use shared header samples/vfs/mountinfo: Use __u64 instead of uint64_t fs: remove useless lockdep assertion fs: use xarray for old mount id selftests: add listmount() iteration tests fs: cache first and last mount samples: add test-list-all-mounts selftests: remove unneeded include selftests: add tests for mntns iteration seltests: move nsfs into filesystems subfolder fs: simplify rwlock to spinlock fs: lockless mntns lookup for nsfs rculist: add list_bidir_{del,prev}_rcu() fs: lockless mntns rbtree lookup fs: add mount namespace to rbtree late fs: prepend statmount.mnt_opts string with security_sb_mnt_opts() mount: remove inlude/nospec.h include samples: add a mountinfo program to demonstrate statmount()/listmount()
2025-01-20	Merge tag 'kernel-6.14-rc1.pid' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_max namespacing update from Christian Brauner: "The pid_max sysctl is a global value. For a long time the default value has been 65535 and during the pidfd dicussions Linus proposed to bump pid_max by default. Based on this discussion systemd started bumping pid_max to 2^22. So all new systems now run with a very high pid_max limit with some distros having also backported that change. The decision to bump pid_max is obviously correct. It just doesn't make a lot of sense nowadays to enforce such a low pid number. There's sufficient tooling to make selecting specific processes without typing really large pid numbers available. In any case, there are workloads that have expections about how large pid numbers they accept. Either for historical reasons or architectural reasons. One concreate example is the 32-bit version of Android's bionic libc which requires pid numbers less than 65536. There are workloads where it is run in a 32-bit container on a 64-bit kernel. If the host has a pid_max value greater than 65535 the libc will abort thread creation because of size assumptions of pthread_mutex_t. That's a fairly specific use-case however, in general specific workloads that are moved into containers running on a host with a new kernel and a new systemd can run into issues with large pid_max values. Obviously making assumptions about the size of the allocated pid is suboptimal but we have userspace that does it. Of course, giving containers the ability to restrict the number of processes in their respective pid namespace indepent of the global limit through pid_max is something desirable in itself and comes in handy in general. Independent of motivating use-cases the existence of pid namespaces makes this also a good semantical extension and there have been prior proposals pushing in a similar direction. The trick here is to minimize the risk of regressions which I think is doable. The fact that pid namespaces are hierarchical will help us here. What we mostly care about is that when the host sets a low pid_max limit, say (crazy number) 100 that no descendant pid namespace can allocate a higher pid number in its namespace. Since pid allocation is hierarchial this can be ensured by checking each pid allocation against the pid namespace's pid_max limit. This means if the allocation in the descendant pid namespace succeeds, the ancestor pid namespace can reject it. If the ancestor pid namespace has a higher limit than the descendant pid namespace the descendant pid namespace will reject the pid allocation. The ancestor pid namespace will obviously not care about this. All in all this means pid_max continues to enforce a system wide limit on the number of processes but allows pid namespaces sufficient leeway in handling workloads with assumptions about pid values and allows containers to restrict the number of processes in a pid namespace through the pid_max interface" * tag 'kernel-6.14-rc1.pid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: tests/pid_namespace: add pid_max tests pid: allow pid_max to be set per pid namespace
2025-01-20	Merge tag 'vfs-6.14-rc1.pidfs' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - Rework inode number allocation Recently we received a patchset that aims to enable file handle encoding and decoding via name_to_handle_at(2) and open_by_handle_at(2). A crucical step in the patch series is how to go from inode number to struct pid without leaking information into unprivileged contexts. The issue is that in order to find a struct pid the pid number in the initial pid namespace must be encoded into the file handle via name_to_handle_at(2). This can be used by containers using a separate pid namespace to learn what the pid number of a given process in the initial pid namespace is. While this is a weak information leak it could be used in various exploits and in general is an ugly wart in the design. To solve this problem a new way is needed to lookup a struct pid based on the inode number allocated for that struct pid. The other part is to remove the custom inode number allocation on 32bit systems that is also an ugly wart that should go away. Allocate unique identifiers for struct pid by simply incrementing a 64 bit counter and insert each struct pid into the rbtree so it can be looked up to decode file handles avoiding to leak actual pids across pid namespaces in file handles. On both 64 bit and 32 bit the same 64 bit identifier is used to lookup struct pid in the rbtree. On 64 bit the unique identifier for struct pid simply becomes the inode number. Comparing two pidfds continues to be as simple as comparing inode numbers. On 32 bit the 64 bit number assigned to struct pid is split into two 32 bit numbers. The lower 32 bits are used as the inode number and the upper 32 bits are used as the inode generation number. Whenever a wraparound happens on 32 bit the 64 bit number will be incremented by 2 so inode numbering starts at 2 again. When a wraparound happens on 32 bit multiple pidfds with the same inode number are likely to exist. This isn't a problem since before pidfs pidfds used the anonymous inode meaning all pidfds had the same inode number. On 32 bit sserspace can thus reconstruct the 64 bit identifier by retrieving both the inode number and the inode generation number to compare, or use file handles. This gives the same guarantees on both 32 bit and 64 bit. - Implement file handle support This is based on custom export operation methods which allows pidfs to implement permission checking and opening of pidfs file handles cleanly without hacking around in the core file handle code too much. - Support bind-mounts Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts for pidfds. This allows pidfds to be safely recovered and checked for process recycling. Instead of checking d_ops for both nsfs and pidfs we could in a follow-up patch add a flag argument to struct dentry_operations that functions similar to file_operations->fop_flags. * tag 'vfs-6.14-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: add pidfd bind-mount tests pidfs: allow bind-mounts pidfs: lookup pid through rbtree selftests/pidfd: add pidfs file handle selftests pidfs: check for valid ioctl commands pidfs: implement file handle support exportfs: add permission method fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh() exportfs: add open method fhandle: simplify error handling pseudofs: add support for export_ops pidfs: support FS_IOC_GETVERSION pidfs: remove 32bit inode number handling pidfs: rework inode number allocation
2025-01-20	selftests/bpf: Add some tests related to 'may_goto 0' insns	Yonghong Song
	Add both asm-based and C-based tests which have 'may_goto 0' insns. For the following code in C-based test, int i, tmp[3]; for (i = 0; i < 3 && can_loop; i++) tmp[i] = 0; The clang compiler (clang 19 and 20) generates may_goto 2 may_goto 1 may_goto 0 r1 = 0 r2 = 0 r3 = 0 The above asm codes are due to llvm pass SROAPass. This ensures the successful verification since tmp[0-2] are initialized. Otherwise, the code without SROAPass like may_goto 5 r1 = 0 may_goto 3 r2 = 0 may_goto 1 r3 = 0 will have verification failure. Although from the source code C-based test should have verification failure, clang compiler optimization generates code with successful verification. If gcc generates different asm codes than clang, the following code can be used for gcc: int i, tmp[3]; for (i = 0; i < 3; i++) tmp[i] = 0; Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250118192034.2124952-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	Merge tag 'vfs-6.14-rc1.misc' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Support caching symlink lengths in inodes The size is stored in a new union utilizing the same space as i_devices, thus avoiding growing the struct or taking up any more space When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% speed up when issuing readlink on /initrd.img on ext4 - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP - Enable VBOXGUEST and VBOXSF_FS on ARM64 Now that VirtualBox is able to run as a host on arm64 (e.g. the Apple M3 processors) we can enable VBOXSF_FS (and in turn VBOXGUEST) for this architecture. Tested with various runs of bonnie++ and dbench on an Apple MacBook Pro with the latest Virtualbox 7.1.4 r165100 installed Cleanups: - Delay sysctl_nr_open check in expand_files() - Use kernel-doc includes in fiemap docbook - Use page->private instead of page->index in watch_queue - Use a consume fence in mnt_idmap() as it's heavily used in link_path_walk() - Replace magic number 7 with ARRAY_SIZE() in fc_log - Sort out a stale comment about races between fd alloc and dup2() - Fix return type of do_mount() from long to int - Various cosmetic cleanups for the lockref code Fixes: - Annotate spinning as unlikely() in __read_seqcount_begin The annotation already used to be there, but got lost in commit 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") - Fix proc_handler for sysctl_nr_open - Flush delayed work in delayed fput() - Fix grammar and spelling in propagate_umount() - Fix ESP not readable during coredump In /proc/PID/stat, there is the kstkesp field which is the stack pointer of a thread. While the thread is active, this field reads zero. But during a coredump, it should have a valid value However, at the moment, kstkesp is zero even during coredump - Don't wake up the writer if the pipe is still full - Fix unbalanced user_access_end() in select code" * tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) gfs2: use lockref_init for qd_lockref erofs: use lockref_init for pcl->lockref dcache: use lockref_init for d_lockref lockref: add a lockref_init helper lockref: drop superfluous externs lockref: use bool for false/true returns lockref: improve the lockref_get_not_zero description lockref: remove lockref_put_not_zero fs: Fix return type of do_mount() from long to int select: Fix unbalanced user_access_end() vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64 pipe_read: don't wake up the writer if the pipe is still full selftests: coredump: Add stackdump test fs/proc: do_task_stat: Fix ESP not readable during coredump fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag fs: sort out a stale comment about races between fd alloc and dup2 fs: Fix grammar and spelling in propagate_umount() fs: fc_log replace magic number 7 with ARRAY_SIZE() fs: use a consume fence in mnt_idmap() file: flush delayed work in delayed fput() ...
2025-01-20	selftests/bpf: Add test case for the freeing of bpf_timer	Hou Tao
	The main purpose of the test is to demonstrate the lock problem for the free of bpf_timer under PREEMPT_RT. When freeing a bpf_timer which is running on other CPU in bpf_timer_cancel_and_free(), hrtimer_cancel() will try to acquire a spin-lock (namely softirq_expiry_lock), however the freeing procedure has already held a raw-spin-lock. The test first creates two threads: one to start timers and the other to free timers. The start-timers thread will start the timer and then wake up the free-timers thread to free these timers when the starts complete. After freeing, the free-timer thread will wake up the start-timer thread to complete the current iteration. A loop of 10 iterations is used. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250117101816.2101857-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-20	Merge branch 'for-6.14/selftests-trivial' into for-linus	Petr Mladek

2025-01-20	Merge tag 'kvm-riscv-6.14-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini
	KVM/riscv changes for 6.14 - Svvptc, Zabha, and Ziccrse extension support for Guest/VM - Virtualize SBI system suspend extension for Guest/VM - Trap related exit statstics as SBI PMU firmware counters for Guest/VM
2025-01-20	Merge tag 'kvm-x86-misc-6.14' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 misc changes for 6.14: - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature. - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM. - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively. - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU.
2025-01-20	Merge branch 'for-6.14/constify-bin-attribute' into for-linus	Jiri Kosina
	- constification of 'struct bin_attribute' in various HID driver (Thomas Weißschuh)
2025-01-19	selftests/efivarfs: fix tests for failed write removal	James Bottomley
	The current self tests expect the zero size remnants that failed variable creation leaves. Update the tests to verify these are now absent. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-19	selftests/efivarfs: add check for disallowing file truncation	James Bottomley
	Now that the ability of arbitrary writes to set the inode size is fixed, verify that a variable file accepts a truncation operation but does not change the stat size because of it. Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-01-18	Merge patch series "riscv: Add support for xtheadvector"	Palmer Dabbelt
	Charlie Jenkins <charlie@rivosinc.com> says: xtheadvector is a custom extension that is based upon riscv vector version 0.7.1 [1]. All of the vector routines have been modified to support this alternative vector version based upon whether xtheadvector was determined to be supported at boot. vlenb is not supported on the existing xtheadvector hardware, so a devicetree property thead,vlenb is added to provide the vlenb to Linux. There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is used to request which thead vendor extensions are supported on the current platform. This allows future vendors to allocate hwprobe keys for their vendor. Support for xtheadvector is also added to the vector kselftests. [1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361c61d335e03d3134b14133f/xtheadvector.adoc * b4-shazam-merge: riscv: Add ghostwrite vulnerability selftests: riscv: Support xtheadvector in vector tests selftests: riscv: Fix vector tests riscv: hwprobe: Document thead vendor extensions and xtheadvector extension riscv: hwprobe: Add thead vendor extension probing riscv: vector: Support xtheadvector save/restore riscv: Add xtheadvector instruction definitions riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT RISC-V: define the elements of the VCSR vector CSR riscv: vector: Use vlenb from DT for thead riscv: Add thead and xtheadvector as a vendor extension riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree dt-bindings: cpus: add a thead vlen register length property dt-bindings: riscv: Add xtheadvector ISA extension description Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-0-236c22791ef9@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-18	selftests: riscv: Support xtheadvector in vector tests	Charlie Jenkins
	Extend existing vector tests to be compatible with the xtheadvector instructions. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Yangyu Chen <cyy@cyyself.name> Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-13-236c22791ef9@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-18	selftests: riscv: Fix vector tests	Charlie Jenkins
	Overhaul the riscv vector tests to use kselftest_harness to help the test cases correctly report the results and decouple the individual test cases from each other. With this refactoring, only run the test cases if vector is reported and properly report the test case as skipped otherwise. The v_initval_nolibc test was previously not checking if vector was supported and used a function (malloc) which invalidates the state of the vector registers. Signed-off-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Yangyu Chen <cyy@cyyself.name> Link: https://lore.kernel.org/r/20241113-xtheadvector-v11-12-236c22791ef9@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2025-01-18	perf test: Update event_groups test to use instructions	Athira Rajeev
	In some of the powerpc platforms, event group testcase fails as below: # perf test -v 'Event groups' 69: Event groups : --- start --- test child forked, pid 9765 Using CPUID 0x00820200 Using hv_24x7 for uncore pmu event 0x0 0x0, 0x0 0x0, 0x0 0x0: Fail 0x0 0x0, 0x0 0x0, 0x1 0x3: Pass The testcase creates various combinations of hw, sw and uncore PMU events and verify group creation succeeds or fails as expected. This tests one of the limitation in perf where it doesn't allow creating a group of events from different hw PMUs. The testcase starts a leader event and opens two sibling events. The combination the fails is three hardware events in a group. "0x0 0x0, 0x0 0x0, 0x0 0x0: Fail" Type zero and config zero which translates to PERF_TYPE_HARDWARE and PERF_COUNT_HW_CPU_CYCLE. There is event constraint in powerpc that events using same counter cannot be programmed in a group. Here there is one alternative event for cycles, hence one leader and only one sibling event can go in as a group. if all three events (leader and two sibling events), are hardware events, use instructions as one of the sibling event. Since PERF_COUNT_HW_INSTRUCTIONS is a generic hardware event and present in all architectures, use this as third event. Reported-by: Tejas Manhas <Tejas.Manhas1@ibm.com> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250110094620.94976-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-18	perf bench: Fix undefined behavior in cmpworker()	Kuan-Wei Chiu
	The comparison function cmpworker() violates the C standard's requirements for qsort() comparison functions, which mandate symmetry and transitivity: Symmetry: If x < y, then y > x. Transitivity: If x < y and y < z, then x < z. In its current implementation, cmpworker() incorrectly returns 0 when w1->tid < w2->tid, which breaks both symmetry and transitivity. This violation causes undefined behavior, potentially leading to issues such as memory corruption in glibc [1]. Fix the issue by returning -1 when w1->tid < w2->tid, ensuring compliance with the C standard and preventing undefined behavior. Link: https://www.qualys.com/2024/01/30/qsort.txt [1] Fixes: 121dd9ea0116 ("perf bench: Add epoll parallel epoll_wait benchmark") Cc: stable@vger.kernel.org Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-18	perf annotate: Prefer passing evsel to evsel->core.idx	Ian Rogers
	An evsel idx may not be stable due to sorting, evlist removal, etc. Try to reduce it being part of APIs by explicitly passing the evsel in annotate code. Internally the code just reads evsel->core.idx so behavior is unchanged. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Chen Ni <nichen@iscas.ac.cn> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	selftests: net: give up on the cmsg_time accuracy on slow machines	Jakub Kicinski
	Commit b9d5f5711dd8 ("selftests: net: increase the delay for relative cmsg_time.sh test") widened the accepted value range 8x but we still see flakes (at a rate of around 7%). Return XFAIL for the most timing sensitive test on slow machines. Before: # ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '8074us - 7397us < 4000', expected 'OK' FAIL - 1/36 cases failed After: # ./cmsg_time.sh Case UDPv4 - TXTIME rel returned '1123us - 941us < 500', expected 'OK' (XFAIL) Case UDPv6 - TXTIME rel returned '1227us - 776us < 500', expected 'OK' (XFAIL) OK Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250116020105.931338-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-17	dcache: back inline names with a struct-wrapped array of unsigned long	Al Viro
	... so that they can be copied with struct assignment (which generates better code) and accessed word-by-word. The type is union shortname_storage; it's a union of arrays of unsigned char and unsigned long. struct name_snapshot.inline_name turned into union shortname_storage; users (all in fs/dcache.c) adjusted. struct dentry.d_iname has some users outside of fs/dcache.c; to reduce the amount of noise in commit, it is replaced with union shortname_storage d_shortname and d_iname is turned into a macro that expands to d_shortname.string (similar to d_lock handling). That compat macro is temporary - most of the remaining instances will be taken out by debugfs series, and once that is merged and few others are taken care of this will go away. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-01-17	Merge tag 'linux-cpupower-6.14-rc1-second' of ↵	Rafael J. Wysocki
	ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux Merge additional cpupower update for 6.14 from Shuah Khan: "- Add missing residency header changes in cpuidle.h to SWIG" * tag 'linux-cpupower-6.14-rc1-second' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/shuah/linux: pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
2025-01-17	perf lock: Rename fields in lock_type_table	Chun-Tse Shao
	`lock_type_table` contains `name` and `str` which can be confusing. Rename them to `flags_name` and `lock_name` and add descriptions to enhance understanding. Tested by building perf for x86. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-3-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	perf lock: Add percpu-rwsem for type filter	Chun-Tse Shao
	percpu-rwsem was missing in man page. And for backward compatibility, replace `pcpu-sem` with `percpu-rwsem` before parsing lock name. Tested `./perf lock con -ab -Y pcpu-sem` and `./perf lock con -ab -Y percpu-rwsem` Fixes: 4f701063bfa2 ("perf lock contention: Show lock type with address") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-2-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	perf lock: Fix parse_lock_type which only retrieve one lock flag	Chun-Tse Shao
	`parse_lock_type` can only add the first lock flag in `lock_type_table` given input `str`. For example, for `Y rwlock`, it only adds `rwlock:R` into this perf session. Another example is for `-Y mutex`, it only adds the mutex without `LCB_F_SPIN` flag. The patch fixes this issue, makes sure both `rwlock:R` and `rwlock:W` will be added with `-Y rwlock`, and so on. Testing: $ ./perf lock con -ab -Y mutex,rwlock -- perf bench sched pipe # Running 'sched/pipe' benchmark: # Executed 1000000 pipe operations between two processes Total time: 9.313 [sec] 9.313976 usecs/op 107365 ops/sec contended total wait max wait avg wait type caller 176 1.65 ms 19.43 us 9.38 us mutex pipe_read+0x57 34 180.14 us 10.93 us 5.30 us mutex pipe_write+0x50 7 77.48 us 16.09 us 11.07 us mutex do_epoll_wait+0x24d 7 74.70 us 13.50 us 10.67 us mutex do_epoll_wait+0x24d 3 35.97 us 14.44 us 11.99 us rwlock:W ep_done_scan+0x2d 3 35.00 us 12.23 us 11.66 us rwlock:W do_epoll_wait+0x255 2 15.88 us 11.96 us 7.94 us rwlock:W do_epoll_wait+0x47c 1 15.23 us 15.23 us 15.23 us rwlock:W do_epoll_wait+0x4d0 1 14.26 us 14.26 us 14.26 us rwlock:W ep_done_scan+0x2d 2 14.00 us 7.99 us 7.00 us mutex pipe_read+0x282 1 12.29 us 12.29 us 12.29 us rwlock:R ep_poll_callback+0x35 1 12.02 us 12.02 us 12.02 us rwlock:W do_epoll_ctl+0xb65 1 10.25 us 10.25 us 10.25 us rwlock:R ep_poll_callback+0x35 1 7.86 us 7.86 us 7.86 us mutex do_epoll_ctl+0x6c1 1 5.04 us 5.04 us 5.04 us mutex do_epoll_ctl+0x3d4 [namhyung: Add a comment and rename to 'mutex:spin' for consistency Fixes: d783ea8f62c4 ("perf lock contention: Simplify parse_lock_type()") Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: nick.forrington@arm.com Link: https://lore.kernel.org/r/20250116235838.2769691-1-ctshao@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	perf lock: Fix return code for functions in __cmd_contention	Athira Rajeev
	perf lock contention returns zero exit value even if the lock contention BPF setup failed. # ./perf lock con -b true libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find '.BTF' ELF section in /lib/modules/6.13.0-rc3+/build/vmlinux libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load object 'lock_contention_bpf' libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed # echo $? 0 Fix this by saving the return code for lock_contention_prepare so that command exits with proper return code. Similarly set the return code properly for two other functions in builtin-lock, namely setup_output_field() and select_key(). Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20250110093730.93610-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	selftests/landlock: Add layout1.umount_sandboxer tests	Mickaël Salaün
	Check that a domain is not tied to the executable file that created it. For instance, that could happen if a Landlock domain took a reference to a struct path. Move global path names to common.h and replace copy_binary() with a more generic copy_file() helper. Test coverage for security/landlock is 92.7% of 1133 lines according to gcc/gcov-14. Cc: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20250108154338.1129069-23-mic@digikod.net [mic: Update date and add test coverage] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17	selftests/landlock: Add wrappers.h	Mickaël Salaün
	Extract syscall wrappers to make them usable by standalone binaries (see next commit). Cc: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20250108154338.1129069-22-mic@digikod.net [mic: Fix comments] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17	selftests/landlock: Fix error message	Mickaël Salaün
	The global variable errno may not be set in test_execute(). Do not use it in related error message. Cc: Günther Noack <gnoack@google.com> Fixes: e1199815b47b ("selftests/landlock: Add user space tests") Link: https://lore.kernel.org/r/20250108154338.1129069-21-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17	selftests/landlock: Add test to check partial access in a mount tree	Mickaël Salaün
	Add layout1.refer_part_mount_tree_is_allowed to test the masked logical issue regarding collect_domain_accesses() calls followed by the is_access_to_paths_allowed() check in current_check_refer_path(). See previous commit. This test should work without the previous fix as well, but it enables us to make sure future changes will not have impact regarding this behavior. Cc: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20250108154338.1129069-13-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17	selftests/landlock: Fix build with non-default pthread linking	Mickaël Salaün
	Old toolchains require explicit -lpthread (e.g. on Debian 11). Cc: Nathan Chancellor <nathan@kernel.org> Cc: Tahera Fahimi <fahimitahera@gmail.com> Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads") Reviewed-by: Günther Noack <gnoack3000@gmail.com> Link: https://lore.kernel.org/r/20250115145409.312226-1-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-01-17	perf hist: Fix width calculation in hpp__fmt()	Dmitry Vyukov
	hpp__width_fn() round up width to length of the field name, hpp__fmt() should do it too. Otherwise, the numbers may end up unaligned if the field name is long. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Link: https://lore.kernel.org/r/20250108065949.235718-1-dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-01-17	tools: Sync if_xdp.h uapi tooling header	Vishal Chourasia
	Sync if_xdp.h uapi header to remove following warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/if_xdp.h' differs from latest version at 'include/uapi/linux/if_xdp.h' Fixes: 48eb03dd2630 ("xsk: Add TX timestamp and TX checksum offload support") Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> Signed-off-by: Song Yoong Siang <yoong.siang.song@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20250115032248.125742-1-yoong.siang.song@intel.com
2025-01-17	libbpf: Work around kernel inconsistently stripping '.llvm.' suffix	Andrii Nakryiko
	Some versions of kernel were stripping out '.llvm.<hash>' suffix from kerne symbols (produced by Clang LTO compilation) from function names reported in available_filter_functions, while kallsyms reported full original name. This confuses libbpf's multi-kprobe logic of finding all matching kernel functions for specified user glob pattern by joining available_filter_functions and kallsyms contents, because joining by full symbol name won't work for symbols containing '.llvm.<hash>' suffix. This was eventually fixed by [0] in the kernel, but we'd like to not regress multi-kprobe experience and add a work around for this bug on libbpf side, stripping kallsym's name if it matches user pattern and contains '.llvm.' suffix. [0] fb6a421fb615 ("kallsyms: Match symbols exactly with CONFIG_LTO_CLANG") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20250117003957.179331-1-andrii@kernel.org
2025-01-17	Merge branch 'for-next/perf' into for-next/core	Will Deacon
	* for-next/perf: (29 commits) perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ drivers/perf: apple_m1: Map generic branch events drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association perf: imx9_perf: Introduce AXI filter version to refactor the driver and better extension perf/arm-cmn: Permit more exhaustive groups perf/dwc_pcie: Qualify RAS DES VSEC Capability by Vendor, Revision drivers/perf: hisi: Delete redundant blank line of DDRC PMU drivers/perf: hisi: Fix incorrect variable name "hha_pmu" in DDRC PMU driver drivers/perf: hisi: Export associated CPUs of each PMU through sysfs drivers/perf: hisi: Provide a generic implementation of cpumask/identifier drivers/perf: hisi: Add a common function to retrieve topology from firmware drivers/perf: hisi: Extract topology information to a separate structure drivers/perf: hisi: Refactor the detection of associated CPUs drivers/perf: hisi: Migrate to one online CPU if no associated one online drivers/perf: hisi: Don't update the associated_cpus on CPU offline drivers/perf: hisi: Define a symbol namespace for HiSilicon Uncore PMUs perf/marvell: Odyssey LLC-TAD performance monitor support perf/marvell: Refactor to extract platform data ...
2025-01-17	Merge branch kvm-arm64/coresight-6.14 into kvmarm-master/next	Marc Zyngier
	* kvm-arm64/coresight-6.14: : . : Trace filtering update from James Clark. From the cover letter: : : "The guest filtering rules from the Perf session are now honored for both : nVHE and VHE modes. This is done by either writing to TRFCR_EL12 at the : start of the Perf session and doing nothing else further, or caching the : guest value and writing it at guest switch for nVHE. In pKVM, trace is : now be disabled for both protected and unprotected guests." : . KVM: arm64: Fix selftests after sysreg field name update coresight: Pass guest TRFCR value to KVM KVM: arm64: Support trace filtering for guests KVM: arm64: coresight: Give TRBE enabled state to KVM coresight: trbe: Remove redundant disable call arm64/sysreg/tools: Move TRFCR definitions to sysreg tools: arm64: Update sysreg.h header files Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-16	bpf: selftests: verifier: Add nullness elision tests	Daniel Xu
	Test that nullness elision works for common use cases. For example, we want to check that both constant scalar spills and STACK_ZERO functions. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks. Particularly tricky are spills both smaller or larger than key size. For smaller, we need to ensure verifier doesn't let through a potential read into unchecked bytes. For larger, endianness comes into play, as the native endian value tracked in the verifier may not be the bytes the kernel would have read out of the key pointer. So check that we disallow both. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/f1dacaa777d4516a5476162e0ea549f7c3354d73.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16	bpf: verifier: Support eliding map lookup nullness	Daniel Xu
	This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check. This is useful for two reasons: 1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier. For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well. For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic. Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/68f3ea96ff3809a87e502a11a4bd30177fc5823e.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16	bpf: verifier: Refactor helper access type tracking	Daniel Xu
	Previously, the verifier was treating all PTR_TO_STACK registers passed to a helper call as potentially written to by the helper. However, all calls to check_stack_range_initialized() already have precise access type information available. Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass enum bpf_access_type to check_stack_range_initialized() to more precisely track helper arguments. One benefit from this precision is that registers tracked as valid spills and passed as a read-only helper argument remain tracked after the call. Rather than being marked STACK_MISC afterwards. An additional benefit is the verifier logs are also more precise. For this particular error, users will enjoy a slightly clearer message. See included selftest updates for examples. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/ff885c0e5859e0cd12077c3148ff0754cad4f7ed.1736886479.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-01-16	selftests/net: packetdrill: make tcp buf limited timing tests benign	Jakub Kicinski
	The following tests are failing on debug kernels: tcp_tcp_info_tcp-info-rwnd-limited.pkt tcp_tcp_info_tcp-info-sndbuf-limited.pkt with reports like: assert 19000 <= tcpi_sndbuf_limited <= 21000, tcpi_sndbuf_limited; \ AssertionError: 18000 and: assert 348000 <= tcpi_busy_time <= 360000, tcpi_busy_time AssertionError: 362000 Extend commit 912d6f669725 ("selftests/net: packetdrill: report benign debug flakes as xfail") to cover them. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250115232129.845884-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16	selftests: drv-net-hw: inject pp_alloc_fail errors in the right place	John Daley
	The tool pp_alloc_fail.py tested error recovery by injecting errors into the function page_pool_alloc_pages(). The page pool allocation function page_pool_dev_alloc() does not end up calling page_pool_alloc_pages(). page_pool_alloc_netmems() seems to be the function that is called by all of the page pool alloc functions in the API, so move error injection to that function instead. Signed-off-by: John Daley <johndale@cisco.com> Link: https://patch.msgid.link/20250115181312.3544-2-johndale@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16	selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED	Pu Lehui
	When redirecting the split BTF to the vmlinux base BTF, we need to mark the distilled base struct/union members of split BTF structs/unions in id_map with BTF_IS_EMBEDDED. This indicates that these types must match both name and size later. So if a needed composite type, which is the member of composite type in the split BTF, has a different size in the base BTF we wish to relocate with, btf__relocate() should error out. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-4-pulehui@huaweicloud.com
2025-01-16	libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED	Pu Lehui
	When redirecting the split BTF to the vmlinux base BTF, we need to mark the distilled base struct/union members of split BTF structs/unions in id_map with BTF_IS_EMBEDDED. This indicates that these types must match both name and size later. Therefore, we need to traverse the entire split BTF, which involves traversing type IDs from nr_dist_base_types to nr_types. However, the current implementation uses an incorrect traversal end type ID, so let's correct it. Fixes: 19e00c897d50 ("libbpf: Split BTF relocation") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-3-pulehui@huaweicloud.com
2025-01-16	libbpf: Fix return zero when elf_begin failed	Pu Lehui
	The error number of elf_begin is omitted when encapsulating the btf_find_elf_sections function. Fixes: c86f180ffc99 ("libbpf: Make btf_parse_elf process .BTF.base transparently") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-2-pulehui@huaweicloud.com
2025-01-16	selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test	Pu Lehui
	Fix btf leak on new btf alloc failure in btf_distill test. Fixes: affdeb50616b ("selftests/bpf: Extend distilled BTF tests to cover BTF relocation") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-1-pulehui@huaweicloud.com
2025-01-16	veristat: Load struct_ops programs only once	Eduard Zingerman
	libbpf automatically adjusts autoload for struct_ops programs, see libbpf.c:bpf_object_adjust_struct_ops_autoload. For example, if there is a map: SEC(".struct_ops.link") struct sched_ext_ops ops = { .enqueue = foo, .tick = bar, }; Both 'foo' and 'bar' would be loaded if 'ops' autocreate is true, both 'foo' and 'bar' would be skipped if 'ops' autocreate is false. This means that when veristat processes object file with 'ops', it would load 4 programs in total: two programs per each 'process_prog' call. The adjustment occurs at object load time, and libbpf remembers association between 'ops' and 'foo'/'bar' at object open time. The only way to persuade libbpf to load one of two is to adjust map initial value, such that only one program is referenced. This patch does exactly that, significantly reducing time to process object files with big number of struct_ops programs. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115223835.919989-1-eddyz87@gmail.com
2025-01-16	selftests/bpf: Fix undefined UINT_MAX in veristat.c	Tony Ambardar
	Include <limits.h> in 'veristat.c' to provide a UINT_MAX definition and avoid multiple compile errors against mips64el/musl-libc: veristat.c: In function 'max_verifier_log_size': veristat.c:1135:36: error: 'UINT_MAX' undeclared (first use in this function) 1135 \| const int SMALL_LOG_SIZE = UINT_MAX >> 8; \| ^~~~~~~~ veristat.c:24:1: note: 'UINT_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'? 23 \| #include <math.h> +++ \|+#include <limits.h> 24 \| Fixes: 1f7c33630724 ("selftests/bpf: Increase verifier log limit in veristat") Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250116075036.3459898-1-tony.ambardar@gmail.com
2025-01-16	perf hist: Fix bogus profiles when filters are enabled	Dmitry Vyukov
	When a filtered column is not present in the sort order, profiles become arbitrary broken. Filtered and non-filtered entries are collapsed together, and the filtered-by field ends up with a random value (either from a filtered or non-filtered entry). If we end up with filtered entry/value, then the whole collapsed entry will be filtered out and will be missing in the profile. If we end up with non-filtered entry/value, then the overhead value will be wrongly larger (include some subset of filtered out samples). This leads to very confusing profiles. The problem is hard to notice, and if noticed hard to understand. If the filter is for a single value, then it can be fixed by adding the corresponding field to the sort order (provided user understood the problem). But if the filter is for multiple values, it's impossible to fix b/c there is no concept of binary sorting based on filter predicate (we want to group all non-filtered values in one bucket, and all filtered values in another). Examples of affected commands: perf report --tid=123 perf report --sort overhead,symbol --comm=foo,bar Fix this by considering filtered status as the highest priority sort/collapse predicate. As a side effect this effectively adds a new feature of showing profile where several lines are combined based on arbitrary filtering predicate. For example, showing symbols from binaries foo and bar combined together, but not from other binaries; or showing combined overhead of several particular threads. Signed-off-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/359dc444ce94d20e59d3a9e360c36fbeac833a04.1736927981.git.dvyukov@google.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>