summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2025-01-27tools/power turbostat: Add tcore clock PMT typePatryk Wlazlyn
Some PMT counters, for example module c1e residency on Intel Clearwater Forest, are reported using tcore clock type. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: version 2025.01.14Len Brown
Fix checkpatch whitespace issues since 2024.11.30 Summary of Changes since 2024.11.30: Enable SysWatt by default. Add initial PTL, CWF platform support. Refuse to run on unsupported platforms without --force to avoid not-so-useful measurements mistakenly made using obsolete versions. Harden initial PMT code in response to early use. Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Allow adding PMT counters directly by sysfs pathPatryk Wlazlyn
Allow user to add PMT counters by either identifying the source with: guid=%u,seq=%u or, since this patch, with direct sysfs path: path=%s, for example path=/sys/class/intel_pmt/telem5 In the later case, the guid and sequence number will be infered by turbostat. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Allow mapping multiple PMT files with the same GUIDPatryk Wlazlyn
Some platforms may expose multiple telemetry files identified with the same GUID. Interpreting it correctly, to associate given counter with a CPU, core or a package requires more metadata from the user. Parse and create ordered, linked list of those PMT aggregators, so that we can identify specific aggregator with GUID + sequence number. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Add PMT directory iterator helperPatryk Wlazlyn
PMT directories exposed in sysfs use the following pattern: telem%u for example: telem0, telem2, telem3, ..., telem15, telem16 This naming scheme preserves the ordering from the PCIe discovery, which is important to correctly map the telemetry directory to the specific domain (cpu, core, package etc). Because readdir() traverses the entries in alphabetical order, causing for example "telem13" to be traversed before "telem3", it is necessary to use scandir() with custom compare() callback to preserve the PCIe ordering. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Extend PMT identification with a sequence numberPatryk Wlazlyn
When platforms expose multiple PMT aggregators with the same GUID, the only way to identify them and map to specific domain is by reading them in an order they were exposed via PCIe. Intel PMT kernel driver does keep the same order and numbers the telemetry directories accordingly. Use GUID and sequence number (order) to uniquely identify PMT aggregators. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Return default value for unmapped PMT domainsPatryk Wlazlyn
When requesting PMT counters with --add command, user may want to skip specifying values for all the domains (that is, cpu, core, package etc). For the domains that user did not provide information on how to read the counter, return default value - zero. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Check for non-zero value when MSR probingPatryk Wlazlyn
For some MSRs, for example, the Platform Energy Counter (RAPL PSYS), it is required to additionally check for a non-zero value to confirm that it is present. From Intel SDM vol. 4: Platform Energy Counter (R/O) This MSR is valid only if both platform vendor hardware implementation and BIOS enablement support it. This MSR will read 0 if not valid. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Enhance turbostat self-performance visibilityZhang Rui
Include procfs and sysfs data collection time in the system summary row of the "usec" column. This is useful for isolating where the time goes during turbostat data collection. Background: Column "usec" shows 1. the number of microseconds elapsed during counter collection, including thread migration -- if any, for each CPU row. 2. total elapsed time to collect the counters on all cpus, for the summary row. This can be used to check the time cost of a give column. For example, run below commands separately turbostat --show usec sleep 1 turbostat --show usec,CoreTmp sleep 1 and the delta in the usec column will tell the time cost for CoreTmp (Thermal MSR read) Problem: Some of the kernel procfs/sysfs accesses are expensive, especially on high core count systems. "usec" column cannot tell this because it only includes the time cost of the counters. Solution: Leave the per CPU "usec" as it is and modify the summary "usec" to include the time cost of the procfs/sysfs snapshot. With it, the "usec" column can be used to get 1. the baseline, e.g. turbostat --show usec sleep 1 2. the baseline + some per CPU counter cost, e.g. turbostat --show usec,CoreTmp sleep 1 3. the baseline + some per CPU sysfs cost, e.g. turbostat --show usec,C1 sleep 1 4. the baseline + /proc/interrupts cost, e.g turbostat --show usec,IRQ sleep 1 Man-page update is also included. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27tools/power turbostat: Add fixed RAPL PSYS divisor for SPRPatryk Wlazlyn
Intel Sapphire Rapids is an exception and has fixed divisor for RAPL PSYS counter set to 1.0. Add a platform bit and enable it for SPR. Reported-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
2025-01-27selftests: livepatch: handle PRINTK_CALLER in check_result()Madhavan Srinivasan
Some arch configs (like ppc64) enable CONFIG_PRINTK_CALLER, which adds the caller id as part of the dmesg. With recent util-linux's update 467a5b3192f16 ('dmesg: add caller_id support') the standard "dmesg" has been enhanced to print PRINTK_CALLER fields. Due to this, even though the expected vs observed are same, end testcase results are failed. -% insmod test_modules/test_klp_livepatch.ko -livepatch: enabling patch 'test_klp_livepatch' -livepatch: 'test_klp_livepatch': initializing patching transition -livepatch: 'test_klp_livepatch': starting patching transition -livepatch: 'test_klp_livepatch': completing patching transition -livepatch: 'test_klp_livepatch': patching complete -% echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled -livepatch: 'test_klp_livepatch': initializing unpatching transition -livepatch: 'test_klp_livepatch': starting unpatching transition -livepatch: 'test_klp_livepatch': completing unpatching transition -livepatch: 'test_klp_livepatch': unpatching complete -% rmmod test_klp_livepatch +[ T3659] % insmod test_modules/test_klp_livepatch.ko +[ T3682] livepatch: enabling patch 'test_klp_livepatch' +[ T3682] livepatch: 'test_klp_livepatch': initializing patching transition +[ T3682] livepatch: 'test_klp_livepatch': starting patching transition +[ T826] livepatch: 'test_klp_livepatch': completing patching transition +[ T826] livepatch: 'test_klp_livepatch': patching complete +[ T3659] % echo 0 > /sys/kernel/livepatch/test_klp_livepatch/enabled +[ T3659] livepatch: 'test_klp_livepatch': initializing unpatching transition +[ T3659] livepatch: 'test_klp_livepatch': starting unpatching transition +[ T789] livepatch: 'test_klp_livepatch': completing unpatching transition +[ T789] livepatch: 'test_klp_livepatch': unpatching complete +[ T3659] % rmmod test_klp_livepatch ERROR: livepatch kselftest(s) failed not ok 1 selftests: livepatch: test-livepatch.sh # exit=1 Currently the check_result() handles the "[time]" removal from the dmesg. Enhance the check to also handle removal of "[Thread Id]" or "[CPU Id]". Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250119163238.749847-1-maddy@linux.ibm.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-01-26Merge tag 'mm-stable-2025-01-26-14-59' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ...
2025-01-26Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-26Merge tag 'trace-tools-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rv and tools/rtla updates from Steven Rostedt: - Add a test suite to test the tool Add a small test suite that can be used to test rtla's basic features to at least have something to test when applying changes. - Automate manual steps in monitor creation While creating a new monitor in RV, besides generating code from dot2k, there are a few manual steps which can be tedious and error prone, like adding the tracepoints, makefile lines and kconfig, or selecting events that start the monitor in the initial state. Updates were made to try and automate as much as possible among those steps to make creating a new RV monitor much quicker. It is still requires to select proper tracepoints, this step is harder to automate in a general way and, in several cases, would still need user intervention. - Have rtla timerlat hist and top set OSNOISE_WORKLOAD flag Have both rtla-timerlat-hist and rtla-timerlat-top set OSNOISE_WORKLOAD to the proper value ("on" when running with -k, "off" when running with -u) every time the option is available instead of setting it only when running with -u. This prevents rtla timerlat -k from giving no results when NO_OSNOISE_WORKLOAD is set, either manually or by an abnormally exited earlier run of rtla timerlat -u. - Stop rtla timerlat on signal properly when overloaded There is an issue where if rtla is run on machines with a high number of CPUs (100+), timerlat can generate more samples than rtla is able to process via tracefs_iterate_raw_events. This is especially common when the interval is set to 100us (rteval and cyclictest default) as opposed to the rtla default of 1000us, but also happens with the rtla default. Currently, this leads to rtla hanging and having to be terminated with SIGTERM. SIGINT setting stop_tracing is not enough, since more and more events are coming and tracefs_iterate_raw_events never exits. To fix this: Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are generated when rtla is supposed to exit. Also on receiving SIGINT/SIGALRM twice, abort iteration immediately with tracefs_iterate_stop, making rtla exit right away instead of waiting for all events to be processed. - Account for missed events Due to tracefs buffer overflow, it can happen that rtla misses events, making the tracing results inaccurate. Count both the number of missed events and the total number of processed events, and display missed events as well as their percentage. The numbers are displayed for both osnoise and timerlat, even though for the earlier, missed events are generally not expected. For hist, the number is displayed at the end of the run; for top, it is displayed on each printing of the top table. - Changes to make osnoise more robust There was a dependency in the code that the first field of the osnoise_tool structure was the trace field. If that that ever changed, then the code work break. Change the code to encapsulate this dependency where the code that uses the structure does not have this dependency. * tag 'trace-tools-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits) rtla: Report missed event count rtla: Add function to report missed events rtla: Count all processed events rtla: Count missed trace events tools/rtla: Add osnoise_trace_is_off() rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads rtla/osnoise: Distinguish missing workload option rtla/timerlat_top: Abort event processing on second signal rtla/timerlat_hist: Abort event processing on second signal rtla/timerlat_top: Stop timerlat tracer on signal rtla/timerlat_hist: Stop timerlat tracer on signal rtla: Add trace_instance_stop tools/rtla: Add basic test suite verification/dot2k: Implement event type detection verification/dot2k: Auto patch current kernel source verification/dot2k: Simplify manual steps in monitor creation rv: Simplify manual steps in monitor creation verification/dot2k: Add support for name and description options verification/dot2k: More robust template variables ...
2025-01-25tools: add VM_WARN_ON_VMG definitionSuren Baghdasaryan
vma tests compilation yields the following error: vma.c:732:9: error: implicit declaration of function ‘VM_WARN_ON_VMG’ Fix it by adding missing VM_WARN_ON_VMG() definition. Link: https://lkml.kernel.org/r/20250116181538.759469-1-surenb@google.com Fixes: e3a7ae85f87c ("mm/debug: prefer VM_WARN_ON_VMG() to report VMG debug warnings") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm/mkdirty: fix memory leak in test_uffdio_copy()liuye
Release memory before exception branch returns to prevent memory leaks Checking tools/testing/selftests/mm/mkdirty.c ... tools/testing/selftests/mm/mkdirty.c:283:3: error: Memory leak: src [memleak] return; ^ Link: https://lkml.kernel.org/r/20250114023838.48589-1-liuye@kylinos.cn Signed-off-by: liuye <liuye@kylinos.cn> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: virtual_address_range: avoid reading from VM_IO mappingsThomas Weißschuh
The virtual_address_range selftest reads from the start of each mapping listed in /proc/self/maps. However not all mappings are valid to be arbitrarily accessed. For example the vvar data used for virtual clocks on x86 [vvar_vclock] can only be accessed if 1) the kernel configuration enables virtual clocks and 2) the hypervisor provided the data for it. Only the VDSO itself has the necessary information to know this. Since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") the virtual clock data was split out into its own mapping, leading to EFAULT from read() during the validation. Check for the VM_IO flag as a proxy. It is present for the VVAR mappings and MMIO ranges can be dangerous to access arbitrarily. Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-4-6fd7269934a5@linutronix.de Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202412271148.2656e485-lkp@intel.com Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Suggested-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/lkml/e97c2a5d-c815-4936-a767-ac42a3220a90@redhat.com/ Acked-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: vm_util: split up /proc/self/smaps parsingThomas Weißschuh
Upcoming changes want to reuse the /proc/self/smaps parsing logic to parse the VmFlags field. As that works differently from the currently parsed HugePage counters, split up the logic so common functionality can be shared. While reworking this code, also use the correct sscanf placeholder for the "uint64_t thp" variable. Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-3-6fd7269934a5@linutronix.de Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: virtual_address_range: unmap chunks after validationThomas Weißschuh
For each accessed chunk a PTE is created. More than 1GiB of PTEs is used in this way. Remove each PTE after validating a chunk to reduce peak memory usage. It is important to only unmap memory that previously mmap()ed, as unmapping other mappings like the stack, heap or executable mappings will crash the process. The mappings read from /proc/self/maps and the return values from mmap() don't allow a simple correlation due to merging and no guaranteed order. To correlate the pointers and mappings use prctl(PR_SET_VMA_ANON_NAME). While it introduces a test dependency, other alternatives would introduce runtime or development overhead. Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-2-6fd7269934a5@linutronix.de Fixes: 010409649885 ("selftests/mm: confirm VA exhaustion without reliance on correctness of mmap()") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: David Hildenbrand <david@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: kernel test robot <oliver.sang@intel.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: virtual_address_range: mmap() without PROT_WRITEThomas Weißschuh
Patch series "selftests/mm: virtual_address_range: Reduce memory", v4. The selftest started failing since commit e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") was merged. While debugging I stumbled upon some memory usage optimizations. With these test now runs on a VM with only 60MiB of memory. This patch (of 4): When mapping a larger chunk than physical memory is available with PROT_WRITE and overcommit is disabled, the mapping will fail. This will prevent the test from running on systems with less then ~1GiB of memory and triggering an inscrutinable test failure. As the mappings are never written to anyways, the flag can be removed. Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-0-6fd7269934a5@linutronix.de Link: https://lkml.kernel.org/r/20250114-virtual_address_range-tests-v4-1-6fd7269934a5@linutronix.de Fixes: 4e5ce33ceb32 ("selftests/vm: add a test for virtual address range mapping") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Dev Jain <dev.jain@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: kernel test robot <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/memfd/memfd_test: fix possible NULL pointer dereferenceliuye
If `name' is NULL, a NULL pointer may be accessed in printf. Link: https://lkml.kernel.org/r/20250114032115.58638-1-liuye@kylinos.cn Signed-off-by: liuye <liuye@kylinos.cn> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Greg Thelen <gthelen@google.com> Cc: "Isaac J. Manjarres" <isaacmanjarres@google.com> Cc: Jeff Xu <jeffxu@google.com> Cc: Saurav Shah <sauravshah.31@gmail.com> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm/cow: modify the incorrect checking parametersHao Ge
In run_with_memfd_hugetlb(), some error handle have passed incorrect parameters. It should be "smem", but it was mistakenly written as "mem". Let's fix it. [gehao@kylinos.cn: fix other errant sites, per Anshuman] Link: https://lkml.kernel.org/r/20250113050908.93638-1-hao.ge@linux.dev Link: https://lkml.kernel.org/r/20250113032858.63670-1-hao.ge@linux.dev Fixes: f8664f3c4a08 ("selftests/vm: cow: basic COW tests for non-anonymous pages") Signed-off-by: Hao Ge <gehao@kylinos.cn> Cc: SeongJae Park <sj@kernel.org> Cc: Shuah Khan (Samsung OSG) <shuah@kernel.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: add tests for splitting pmd THPs to all lower ordersZi Yan
Kernel already supports splitting a folio to any lower order. Test it. [ziy@nvidia.com: no need to test splitting to order-1] Link: https://lkml.kernel.org/r/DDA202EA-4664-4F50-A7FD-B00CBB7A624B@nvidia.com Link: https://lkml.kernel.org/r/20250110235028.96824-2-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Alexander Zhu <alexlzhu@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: use selftests framework to print test resultZi Yan
Otherwise the number of tests does not match the reality. Link: https://lkml.kernel.org/r/20250110235028.96824-1-ziy@nvidia.com Fixes: 391e86971161 ("mm: selftest to verify zero-filled pages are mapped to zeropage") Signed-off-by: Zi Yan <ziy@nvidia.com> Cc: Alexander Zhu <alexlzhu@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: make mmap_region() internalLorenzo Stoakes
Now that we have removed the one user of mmap_region() outside of mm, make it internal and add it to vma.c so it can be userland tested. This ensures that all external memory mappings are performed using the appropriate interfaces and allows us to modify memory mapping logic as we see fit. Additionally expand test stubs to allow for the mmap_region() code to compile and be userland testable. Link: https://lkml.kernel.org/r/de5a3c574d35c26237edf20a1d8652d7305709c9.1735819274.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: introduce uffd-wp-mremap regression testRyan Roberts
Introduce a test that registers a range of memory for UFFDIO_WRITEPROTECT_MODE_WP without UFFD_FEATURE_EVENT_REMAP. First check that the uffd-wp bit is set for every PTE in the range. Then mremap() the range to a new location and check that the uffd-wp bit is clear for every PTE in the range. Run the test for small folios, all supported THP sizes and all supported hugetlb sizes, and for swapped out memory, shared and private. There was previously a bug in the kernel where the uffd-wp bits remained set in all PTEs for this case, after fixing the kernel, the tests all pass. Link: https://lkml.kernel.org/r/20250107144755.1871363-3-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25kunit: configs: remove configs for DAMON debugfs interface testsSeongJae Park
It's time to remove DAMON debugfs interface, which has deprecated long before in February 2023. Read the cover letter of this patch series for more details. Remove kernel configs for running DAMON debugfs interface kunit tests from the kunit all_tests configuration, to prevent unnecessary noises from tests. Link: https://lkml.kernel.org/r/20250106191941.107070-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rae Moar <rmoar@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/damon: remove tests for DAMON debugfs interfaceSeongJae Park
It's time to remove DAMON debugfs interface, which has deprecated long before in February 2023. Read the cover letter of this patch series for more details. Remove selftests for the interface, to prevent causing unnecessary test failures. Link: https://lkml.kernel.org/r/20250106191941.107070-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rae Moar <rmoar@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/damon/config: remove configs for DAMON debugfs interface selftestsSeongJae Park
It's time to remove DAMON debugfs interface, which has deprecated long before in February 2023. Read the cover letter of this patch series for more details. Remove configs for selftests of it from DAMON selftests config file, to prevent unnecessary noises from the tests. [1] https://lore.kernel.org/20230209192009.7885-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250106191941.107070-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Rae Moar <rmoar@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25selftests/mm: add new test cases to the migration testDonet Tom
Added three new test cases to the migration tests: 1. Shared anon THP migration test This test will mmap shared anon memory, madvise it to MADV_HUGEPAGE, then do migration entry testing. One thread will move pages back and forth between nodes whilst other threads try and access them. 2. Private anon hugetlb migration test This test will mmap private anon hugetlb memory and then do the migration entry testing. 3. Shared anon hugetlb migration test This test will mmap shared anon hugetlb memory and then do the migration entry testing. Test results ============ # ./tools/testing/selftests/mm/migration TAP version 13 1..6 # Starting 6 tests from 1 test cases. # RUN migration.private_anon ... # OK migration.private_anon ok 1 migration.private_anon # RUN migration.shared_anon ... # OK migration.shared_anon ok 2 migration.shared_anon # RUN migration.private_anon_thp ... # OK migration.private_anon_thp ok 3 migration.private_anon_thp # RUN migration.shared_anon_thp ... # OK migration.shared_anon_thp ok 4 migration.shared_anon_thp # RUN migration.private_anon_htlb ... # OK migration.private_anon_htlb ok 5 migration.private_anon_htlb # RUN migration.shared_anon_htlb ... # OK migration.shared_anon_htlb ok 6 migration.shared_anon_htlb # PASSED: 6 / 6 tests passed. # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0 # Link: https://lkml.kernel.org/r/20241219102720.4487-1-donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Dev Jain <dev.jain@arm.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25tools: testing: add simple __mmap_region() userland testLorenzo Stoakes
Introduce demonstrative, basic, __mmap_region() test upon which we can base further work upon moving forwards. This simply asserts that mappings can be made and merges occur as expected. As part of this change, fix the security_vm_enough_memory_mm() stub which was previously incorrectly implemented. Link: https://lkml.kernel.org/r/20241213162409.41498-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25Merge tag 'pci-v6.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Batch sizing of multiple BARs while memory decoding is disabled instead of disabling/enabling decoding for each BAR individually; this optimizes virtualized environments where toggling decoding enable is expensive (Alex Williamson) - Add host bridge .enable_device() and .disable_device() hooks for bridges that need to configure things like Requester ID to StreamID mapping when enabling devices (Frank Li) - Extend struct pci_ecam_ops with .enable_device() and .disable_device() hooks so drivers that use pci_host_common_probe() instead of their own .probe() have a way to set the .enable_device() callbacks (Marc Zyngier) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) Driver binding: - Update resource request API documentation to encourage callers to supply a driver name when requesting resources (Philipp Stanner) - Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers of pci_intx() (which is sometimes managed) can explicitly choose the one they need (Philipp Stanner) - Convert drivers from pci_intx() to always-managed pcim_intx() or never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix, pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner) - Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and pcim_intx() is always managed (Philipp Stanner) Error handling: - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes that don't configure it correctly (Takashi Iwai) ASPM: - Save parent L1 PM Substates config so when we restore it along with an endpoint's config, the parent info isn't junk (Jian-Hong Pan) Power management: - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because the system can't wake up from suspend (Werner Sembach) Endpoint framework: - Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't call devres_release() (Zijun Hu) - Finish virtual EP removal in pci_epf_remove_vepf(), which previously caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu) - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't depend on the BAR_MASK reset value being larger than the requested BAR size (Niklas Cassel) - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads from bypassing the iATU if we reduced the BAR size (Niklas Cassel) - Verify address alignment when programming iATU so we don't attempt to write bits that are read-only because of the BAR size, which could lead to directing accesses to the wrong address (Niklas Cassel) - Implement artpec6 pci_epc_features so we can rely on all drivers supporting it so we can use it in EPC core code (Niklas Cassel) - Check for BARs of fixed size to prevent endpoint drivers from trying to change their size (Niklas Cassel) - Verify that requested BAR size is a power of two when endpoint driver sets the BAR (Niklas Cassel) Endpoint framework tests: - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing dma_chan_rx (Mohamed Khalfella) - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam) - Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas Cassel) - Add Endpoint test for consecutive BARs (Niklas Cassel) - Remove redundant comparison from Endpoint BAR test because a > 1MB BAR can always be exactly covered by iterating with a 1MB buffer (Hans Zhang) - Move and convert PCI Endpoint tests from tools/pci to Kselftests (Manivannan Sadhasivam) Apple PCIe controller driver: - Convert StreamID mapping configuration from a bus notifier to the .enable_device() and .disable_device() callbacks (Marc Zyngier) Freescale i.MX6 PCIe controller driver: - Add Requester ID to StreamID mapping configuration when enabling devices (Frank Li) - Use DWC core suspend/resume functions for imx6 (Frank Li) - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard Zhu) - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li) - Add DT binding for optional i.MX95 Refclk and driver support to enable it if the platform hasn't enabled it (Richard Zhu) - Configure PHY based on controller being in Root Complex or Endpoint mode (Frank Li) - Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources() instead of hardcoding them (Richard Zhu) - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is asserted in imx_pcie_assert_core_reset() (Richard Zhu) - Add missing reference clock enable or disable logic for IMX6SX, IMX7D, IMX8MM (Richard Zhu) - Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu) Freescale Layerscape PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_property_read_u32_array() (Krzysztof Kozlowski) Marvell MVEBU PCIe controller driver: - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen) MediaTek PCIe Gen3 controller driver: - Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi) - Rearrange reset assert/deassert so they're both done in the *_power_up() callbacks (Lorenzo Bianconi) - Document that Airoha EN7581 requires PHY init and power-on before PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi) - Move Airoha EN7581 post-reset delay from the en7581 clock .enable() method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi) - Sleep instead of delay during Airoha EN7581 power-up, since this is a non-atomic context (Lorenzo Bianconi) - Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to avoid a hardware defect (Lorenzo Bianconi) - Enable async probe to reduce system startup time (Douglas Anderson) Microchip PolarFlare PCIe controller driver: - Set up the inbound address translation based on whether the platform allows coherent or non-coherent DMA (Daire McNamara) - Update DT binding such that platforms are DMA-coherent by default and must specify 'dma-noncoherent' if needed (Conor Dooley) Mobiveil PCIe controller driver: - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names' and 'reg-names' (Frank Li) Qualcomm PCIe controller driver: - Add DT SM8550 and SM8650 optional 'global' interrupt for link events (Neil Armstrong) - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta Mylavarapu) - If 'global' IRQ is supported for detection of Link Up events, tell DWC core not to wait for link up (Krishna chaitanya chundru) Renesas R-Car PCIe controller driver: - Avoid passing stack buffer as resource name (King Dix) Rockchip PCIe controller driver: - Simplify clock and reset handling by using bulk interfaces (Anand Moon) - Pass typed rockchip_pcie (not void) pointer to rockchip_pcie_disable_clocks() (Anand Moon) - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan Carpenter) Rockchip DesignWare PCIe controller driver: - Use dll_link_up IRQ to detect Link Up and enumerate devices so users don't have to manually rescan (Niklas Cassel) - Tell DWC core not to wait for link up since the 'sys' interrupt is required and detects Link Up events (Niklas Cassel) Synopsys DesignWare PCIe controller driver: - Don't wait for link up in DWC core if driver can detect Link Up event (Krishna chaitanya chundru) - Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru) - Always stop link in dw_pcie_suspend_noirq(), which is required at least for i.MX8QM to re-establish link on resume (Richard Zhu) - Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu) - Add struct of_pci_range.parent_bus_addr for devices that need their immediate parent bus address, not the CPU address, e.g., to program an internal Address Translation Unit (iATU) (Frank Li) TI DRA7xx PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_parse_phandle_with_fixed_args() or of_property_read_u32_index() (Krzysztof Kozlowski) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy Havalige) MicroSemi Switchtec management driver: - Add Microchip PCI100X device IDs (Rakesh Babu Saladi) Miscellaneous: - Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki)" * tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits) PCI: Batch BAR sizing operations dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent PCI: microchip: Set inbound address translation for coherent or non-coherent mode Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly selftests: pci_endpoint: Migrate to Kselftest framework selftests: Move PCI Endpoint tests from tools/pci to Kselftests misc: pci_endpoint_test: Fix IOCTL return value dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML PCI: switchtec: Add Microchip PCI100X device IDs misc: pci_endpoint_test: Remove redundant 'remainder' test misc: pci_endpoint_test: Add consecutive BAR test misc: pci_endpoint_test: Add support for capabilities PCI: endpoint: pci-epf-test: Add support for capabilities PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error PCI: dwc: Simplify config resource lookup ...
2025-01-25Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC updates from Stafford Horne: - Added support for restartable sequences (me) - Migration to Generic built-in DTB (Masahiro Yamada) * tag 'for-linus' of https://github.com/openrisc/linux: rseq/selftests: Add support for OpenRISC openrisc: Add support for restartable sequences openrisc: Add HAVE_REGS_AND_STACK_ACCESS_API support openrisc: migrate to the generic rule for built-in DTB
2025-01-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service support for usermode VMM x86: - Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a direct call to kvm_tdp_page_fault() when RETPOLINE is enabled - Ensure that all SEV code is compiled out when disabled in Kconfig, even if building with less brilliant compilers - Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes - Use str_enabled_disabled() to replace open coded strings - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache prior to every VM-Enter - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU - Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not - As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. When TDX will be enabled, operations on private pages will need to go through the privileged TDX Module via SEAMCALLs; as a result, they are limited and relatively slow compared to reading a PTE. The patches included in 6.14 allow KVM to keep a mirror of the private EPT in host memory, and define entries in kvm_x86_ops to operate on external page tables such as the TDX private EPT - The recently introduced conversion of the NX-page reclamation kthread to vhost_task moved the task under the main process. The task is created as soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that didn't expect to see any child task of the VM process until it started creating its own userspace threads. In particular crosvm refuses to fork() if procfs shows any child task, so unbreak it by creating the task lazily. This is arguably a userspace bug, as there can be other kinds of legitimate worker tasks and they wouldn't impede fork(); but it's not like userspace has a way to distinguish kernel worker tasks right now. Should they show as "Kthread: 1" in proc/.../status? x86 - Intel: - Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit while L2 is active, while ultimately results in a hardware-accelerated L1 EOI effectively being lost - Honor event priority when emulating Posted Interrupt delivery during nested VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the interrupt - Rework KVM's processing of the Page-Modification Logging buffer to reap entries in the same order they were created, i.e. to mark gfns dirty in the same order that hardware marked the page/PTE dirty - Misc cleanups Generic: - Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when setting memory regions and add a dedicated API for setting KVM-internal memory regions. The API can then explicitly disallow all flags for KVM-internal memory regions - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw - Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined - Gracefully handle xarray insertion failures; even though such failures are impossible in practice after xa_reserve(), reserving an entry is always followed by xa_store() which does not know (or differentiate) whether there was an xa_reserve() before or not RISC-V: - Zabha, Svvptc, and Ziccrse extension support for guests. None of them require anything in KVM except for detecting them and marking them as supported; Zabha adds byte and halfword atomic operations, while the others are markers for specific operation of the TLB and of LL/SC instructions respectively - Virtualize SBI system suspend extension for Guest/VM - Support firmware counters which can be used by the guests to collect statistics about traps that occur in the host Selftests: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel - Ensure that format strings for logging statements are checked by the compiler even when the logging statement itself is disabled - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. It seems that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that events are counting correctly without actually knowing what the events count given the underlying hardware; this can happen if Intel reuses a formerly microarchitecture-specific event encoding as an architectural event, as was the case for Top-Down Slots" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits) kvm: defer huge page recovery vhost task to later KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() KVM: Disallow all flags for KVM-internal memslots KVM: x86: Drop double-underscores from __kvm_set_memory_region() KVM: Add a dedicated API for setting KVM-internal memslots KVM: Assert slots_lock is held when setting memory regions KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API) LoongArch: KVM: Add hypercall service support for usermode VMM LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup() KVM: VMX: read the PML log in the same order as it was written KVM: VMX: refactor PML terminology KVM: VMX: Fix comment of handle_vmx_instruction() KVM: VMX: Reinstate __exit attribute for vmx_exit() KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup() KVM: x86: Avoid double RDPKRU when loading host/guest PKRU KVM: x86: Use LVT_TIMER instead of an open coded literal RISC-V: KVM: Add new exit statstics for redirected traps RISC-V: KVM: Update firmware counters for various events RISC-V: KVM: Redirect instruction access fault trap to guest ...
2025-01-24Xarray: do not return sibling entries from xas_find_marked()Kemeng Shi
Patch series "Fixes and cleanups to xarray", v5. This series contains some random fixes and cleanups to xarray. Patch 1-2 are fixes and patch 3-6 are cleanups. More details can be found in respective patches. This patch (of 5): Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return sibling entries from xa_load()"), we may return sibling entries from xas_find_marked as following: Thread A: Thread B: xa_store_range(xa, entry, 6, 7, gfp); xa_set_mark(xa, 6, mark) XA_STATE(xas, xa, 6); xas_find_marked(&xas, 7, mark); offset = xas_find_chunk(xas, advance, mark); [offset is 6 which points to a valid entry] xa_store_range(xa, entry, 4, 7, gfp); entry = xa_entry(xa, node, 6); [entry is a sibling of 4] if (!xa_is_node(entry)) return entry; Skip sibling entry like xas_find() does to protect caller from seeing sibling entry from xas_find_marked() or caller may use sibling entry as a valid entry and crash the kernel. Besides, load_race() test is modified to catch mentioned issue and modified load_race() only passes after this fix is merged. Here is an example how this bug could be triggerred in tmpfs which enables large folio in mapping: Let's take a look at involved racer: 1. How pages could be created and dirtied in shmem file. write ksys_write vfs_write new_sync_write shmem_file_write_iter generic_perform_write shmem_write_begin shmem_get_folio shmem_allowable_huge_orders shmem_alloc_and_add_folios shmem_alloc_folio __folio_set_locked shmem_add_to_page_cache XA_STATE_ORDER(..., index, order) xax_store() shmem_write_end folio_mark_dirty() 2. How dirty pages could be deleted in shmem file. ioctl do_vfs_ioctl file_ioctl ioctl_preallocate vfs_fallocate shmem_fallocate shmem_truncate_range shmem_undo_range truncate_inode_folio filemap_remove_folio page_cache_delete xas_store(&xas, NULL); 3. How dirty pages could be lockless searched sync_file_range ksys_sync_file_range __filemap_fdatawrite_range filemap_fdatawrite_wbc do_writepages writeback_use_writepage writeback_iter writeback_get_folio filemap_get_folios_tag find_get_entry folio = xas_find_marked() folio_try_get(folio) Kernel will crash as following: 1.Create 2.Search 3.Delete /* write page 2,3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 2, order = 1) xa_store(&xas, folio) shmem_write_end folio_mark_dirty() /* sync page 2 and page 3 */ sync_file_range ... find_get_entry folio = xas_find_marked() /* offset will be 2 */ offset = xas_find_chunk() /* delete page 2 and page 3 */ ioctl ... xas_store(&xas, NULL); /* write page 0-3 */ write ... shmem_write_begin XA_STATE_ORDER(xas, i_pages, index = 0, order = 2) xa_store(&xas, folio) shmem_write_end folio_mark_dirty(folio) /* get sibling entry from offset 2 */ entry = xa_entry(.., 2) /* use sibling entry as folio and crash kernel */ folio_try_get(folio) Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Mattew Wilcox <willy@infradead.org> [English fixes] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-24Merge tag 'efi-next-for-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI updates from Ard Biesheuvel: - Increase the headroom in the EFI memory map allocation created by the EFI stub. This is needed because event callbacks called during ExitBootServices() may cause fragmentation, and reallocation is not allowed after that. - Drop obsolete UGA graphics code and switch to a more ergonomic API to traverse handle buffers. Simplify some error paths using a __free() helper while at it. - Fix some W=1 warnings when CONFIG_EFI=n - Rely on the dentry cache to keep track of the contents of the efivarfs filesystem, rather than using a separate linked list. - Improve and extend efivarfs test cases. - Synchronize efivarfs with underlying variable store on resume from hibernation - this is needed because the firmware itself or another OS running on the same machine may have modified it. - Fix x86 EFI stub build with GCC 15. - Fix kexec/x86 false positive warning in EFI memory attributes table sanity check. * tag 'efi-next-for-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (23 commits) x86/efi: skip memattr table on kexec boot efivarfs: add variable resync after hibernation efivarfs: abstract initial variable creation routine efi: libstub: Use '-std=gnu11' to fix build with GCC 15 selftests/efivarfs: add concurrent update tests selftests/efivarfs: fix tests for failed write removal efivarfs: fix error on write to new variable leaving remnants efivarfs: remove unused efivarfs_list efivarfs: move variable lifetime management into the inodes selftests/efivarfs: add check for disallowing file truncation efivarfs: prevent setting of zero size on the inodes in the cache efi: sysfb_efi: fix W=1 warnings when EFI is not set efi/libstub: Use __free() helper for pool deallocations efi/libstub: Use cleanup helpers for freeing copies of the memory map efi/libstub: Simplify PCI I/O handle buffer traversal efi/libstub: Refactor and clean up GOP resolution picker code efi/libstub: Simplify GOP handling code efi/libstub: Use C99-style for loop to traverse handle buffer x86/efistub: Drop long obsolete UGA support efivarfs: make variable_is_present use dcache lookup ...
2025-01-24rtla: Report missed event countTomas Glozar
Print how many events were missed by trace buffer overflow in the main instance at the end of the run (for hist) or during the run (for top). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250123142339.990300-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Tested-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla: Add function to report missed eventsTomas Glozar
Add osnoise_report_missed_events to be used to report the number of missed events either during or after an osnoise or timerlat run. Also, display the percentage of missed events compared to the total number of received events. If an unknown number of missed events was reported during the run, the entire number of missed events is reported as unknown. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250123142339.990300-4-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla: Count all processed eventsTomas Glozar
Add a field processed_events to struct trace_instance and increment it in collect_registered_events, regardless of whether a handler is registered for the event. The purpose is to calculate the percentage of events that were missed due to tracefs buffer overflow. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250123142339.990300-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla: Count missed trace eventsTomas Glozar
Add function collect_missed_events to trace.c to act as a callback for tracefs_follow_missed_events, summing the number of total missed events into a new field missing_events of struct trace_instance. In case record->missed_events is negative, trace->missed_events is set to UINT64_MAX to signify an unknown number of events was missed. The callback is activated on initialization of the trace instance. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250123142339.990300-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24tools/rtla: Add osnoise_trace_is_off()Costa Shulyupin
All of the users of trace_is_off() passes in &record->trace as the second parameter, where record is a pointer to a struct osnoise_tool. This record could be NULL and there is a hidden dependency that the trace field is the first field to allow &record->trace to work with a NULL record pointer. In order to make this code a bit more robust, as record shouldn't be dereferenced if it is NULL, even if the code does work, create a new function called osnoise_trace_is_off() that takes the pointer to a struct osnoise_tool as its second parameter. This way it can properly test if it is NULL before it dereferences it. The old function trace_is_off() is removed and the function osnoise_trace_is_off() is added into osnoise.c which is what the struct osnoise_tool is associated with. Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250115180055.2136815-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threadsTomas Glozar
When using rtla timerlat with userspace threads (-u or -U), rtla disables the OSNOISE_WORKLOAD option in /sys/kernel/tracing/osnoise/options. This option is not re-enabled in a subsequent run with kernel-space threads, leading to rtla collecting no results if the previous run exited abnormally: $ rtla timerlat top -u ^\Quit (core dumped) $ rtla timerlat top -k -d 1s Timer Latency 0 00:00:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max The issue persists until OSNOISE_WORKLOAD is set manually by running: $ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if available to fix the issue. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250107144823.239782-4-tglozar@redhat.com Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threadsTomas Glozar
When using rtla timerlat with userspace threads (-u or -U), rtla disables the OSNOISE_WORKLOAD option in /sys/kernel/tracing/osnoise/options. This option is not re-enabled in a subsequent run with kernel-space threads, leading to rtla collecting no results if the previous run exited abnormally: $ rtla timerlat hist -u ^\Quit (core dumped) $ rtla timerlat hist -k -d 1s Index over: count: min: avg: max: ALL: IRQ Thr Usr count: 0 0 0 min: - - - avg: - - - max: - - - The issue persists until OSNOISE_WORKLOAD is set manually by running: $ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if available to fix the issue. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250107144823.239782-3-tglozar@redhat.com Fixes: ed774f7481fa ("rtla/timerlat_hist: Add timerlat user-space support") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/osnoise: Distinguish missing workload optionTomas Glozar
osnoise_set_workload returns -1 for both missing OSNOISE_WORKLOAD option and failure in setting the option. Return -1 for missing and -2 for failure to distinguish them. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250107144823.239782-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Abort event processing on second signalTomas Glozar
If either SIGINT is received twice, or after a SIGALRM (that is, after timerlat was supposed to stop), abort processing events currently left in the tracefs buffer and exit immediately. This allows the user to exit rtla without waiting for processing all events, should that take longer than wanted, at the cost of not processing all samples. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_hist: Abort event processing on second signalTomas Glozar
If either SIGINT is received twice, or after a SIGALRM (that is, after timerlat was supposed to stop), abort processing events currently left in the tracefs buffer and exit immediately. This allows the user to exit rtla without waiting for processing all events, should that take longer than wanted, at the cost of not processing all samples. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_top: Stop timerlat tracer on signalTomas Glozar
Currently, when either SIGINT from the user or SIGALRM from the duration timer is caught by rtla-timerlat, stop_tracing is set to break out of the main loop. This is not sufficient for cases where the timerlat tracer is producing more data than rtla can consume, since in that case, rtla is looping indefinitely inside tracefs_iterate_raw_events, never reaches the check of stop_tracing and hangs. In addition to setting stop_tracing, also stop the timerlat tracer on received signal (SIGINT or SIGALRM). This will stop new samples so that the existing samples may be processed and tracefs_iterate_raw_events eventually exits. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-4-tglozar@redhat.com Fixes: a828cd18bc4a ("rtla: Add timerlat tool and timelart top mode") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla/timerlat_hist: Stop timerlat tracer on signalTomas Glozar
Currently, when either SIGINT from the user or SIGALRM from the duration timer is caught by rtla-timerlat, stop_tracing is set to break out of the main loop. This is not sufficient for cases where the timerlat tracer is producing more data than rtla can consume, since in that case, rtla is looping indefinitely inside tracefs_iterate_raw_events, never reaches the check of stop_tracing and hangs. In addition to setting stop_tracing, also stop the timerlat tracer on received signal (SIGINT or SIGALRM). This will stop new samples so that the existing samples may be processed and tracefs_iterate_raw_events eventually exits. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-3-tglozar@redhat.com Fixes: 1eeb6328e8b3 ("rtla/timerlat: Add timerlat hist mode") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24rtla: Add trace_instance_stopTomas Glozar
Support not only turning trace on for the timerlat tracer, but also turning it off. This will be used in subsequent patches to stop the timerlat tracer without also wiping the trace buffer. Cc: stable@vger.kernel.org Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250116144931.649593-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-24Merge tag 'platform-drivers-x86-v6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: "acer-wmi: - Add support for PH14-51, PH16-72, and Nitro AN515-58 - Add proper hwmon support - Improve error handling when reading "gaming system info" - Replace direct EC reads for the current platform profile with WMI calls to handle EC address variations - Replace custom platform_profile cycling with the generic one ACPI: - platform_profile: Major refactoring and improvements - Support registering multiple platform_profile handlers concurrently to avoid the need to quirk which handler takes precedence - Support reporting "custom" profile for cases where the current profile is ambiguous or when settings tweaks are done outside the pre-defined profile - Abstract and layer platform_profile API better using the class_dev and drvdata - Various minor improvements - Add Documentation and kerneldoc amd/hsmp: - Add support for HSMP protocol v7 amd/pmc: - Support AMD 1Ah family 70h - Support STB with Ryzen desktop SoCs amd/pmf: - Support Custom BIOS inputs for PMF TA - Support passing SRA sensor data from AMD SFH (HID) to PMF TA dell-smo8800: - Move SMO88xx quirk away from the generic i2c-i801 driver - Add accelerometer support for Dell Latitude E6330/E6430 and XPS 9550 - Support probing accelerometer for models yet to be listed in the DMI mapping table because ACPI lacks i2c-address for the accelerometer (behind a module parameter because probing might be dangerous) HID: - amd_sfh: Add support for exporting SRA sensor data hp-wmi: - Add fan and thermal support for Victus 16-s1000 input: - Add key for phone linking - i8042: Add context for the i8042 filter to enable cleaning up the filter related global variables from pdx86 drivers lenovo-wmi-camera: - Use SW_CAMERA_LENS_COVER instead of KEY_CAMERA_ACCESS mellanox mlxbf-pmc: - Add support for monitoring cycle count - Add Documentation thinkpad_acpi: - Add support for phone link key tools/power/x86/intel-speed-select: - Fix Turbo Ratio Limit restore x86-android-tables: - Add support for Vexia EDU ATLA 10 Bluetooth and EC battery driver And miscellaneous cleanups / refactoring / improvements" * tag 'platform-drivers-x86-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (133 commits) platform/x86: acer-wmi: Fix initialization of last_non_turbo_profile platform/x86: acer-wmi: Ignore AC events platform/mellanox: mlxreg-io: use sysfs_emit() instead of sprintf() platform/mellanox: mlxreg-hotplug: use sysfs_emit() instead of sprintf() platform/mellanox: mlxbf-bootctl: use sysfs_emit() instead of sprintf() platform/x86: hp-wmi: Add fan and thermal profile support for Victus 16-s1000 ACPI: platform_profile: Add a prefix to log messages ACPI: platform_profile: Add documentation ACPI: platform_profile: Clean platform_profile_handler ACPI: platform_profile: Move platform_profile_handler ACPI: platform_profile: Remove platform_profile_handler from exported symbols platform/x86: thinkpad_acpi: Use devm_platform_profile_register() platform/x86: inspur_platform_profile: Use devm_platform_profile_register() platform/x86: hp-wmi: Use devm_platform_profile_register() platform/x86: ideapad-laptop: Use devm_platform_profile_register() platform/x86: dell-pc: Use devm_platform_profile_register() platform/x86: asus-wmi: Use devm_platform_profile_register() platform/x86: amd: pmf: sps: Use devm_platform_profile_register() platform/x86: acer-wmi: Use devm_platform_profile_register() platform/surface: surface_platform_profile: Use devm_platform_profile_register() ...