summaryrefslogtreecommitdiff
path: root/arch/x86/events/amd/brs.c
AgeCommit message (Collapse)Author
2025-05-02x86/msr: Convert __rdmsr() uses to native_rdmsrq() usesXin Li (Intel)
__rdmsr() is the lowest level MSR write API, with native_rdmsr() and native_rdmsrq() serving as higher-level wrappers around it. #define native_rdmsr(msr, val1, val2) \ do { \ u64 __val = __rdmsr((msr)); \ (void)((val1) = (u32)__val); \ (void)((val2) = (u32)(__val >> 32)); \ } while (0) static __always_inline u64 native_rdmsrq(u32 msr) { return __rdmsr(msr); } However, __rdmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __rdmsr() uses to native_rdmsrq() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-10-xin@zytor.com
2025-05-02x86/msr: Convert __wrmsr() uses to native_wrmsr{,q}() usesXin Li (Intel)
__wrmsr() is the lowest level MSR write API, with native_wrmsr() and native_wrmsrq() serving as higher-level wrappers around it: #define native_wrmsr(msr, low, high) \ __wrmsr(msr, low, high) #define native_wrmsrl(msr, val) \ __wrmsr((msr), (u32)((u64)(val)), \ (u32)((u64)(val) >> 32)) However, __wrmsr() continues to be utilized in various locations. MSR APIs are designed for different scenarios, such as native or pvops, with or without trace, and safe or non-safe. Unfortunately, the current MSR API names do not adequately reflect these factors, making it challenging to select the most appropriate API for various situations. To pave the way for improving MSR API names, convert __wrmsr() uses to native_wrmsr{,q}() to ensure consistent usage. Later, these APIs can be renamed to better reflect their implications, such as native or pvops, with or without trace, and safe or non-safe. No functional change intended. Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20250427092027.1598740-8-xin@zytor.com
2025-04-10x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-17perf: Supply task information to sched_task()Kan Liang
To save/restore LBR call stack data in system-wide mode, the task_struct information is required. Extend the parameters of sched_task() to supply task_struct information. When schedule in, the LBR call stack data for new task will be restored. When schedule out, the LBR call stack data for old task will be saved. Only need to pass the required task_struct information. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20250314172700.438923-4-kan.liang@linux.intel.com
2024-01-03arch/x86: Fix typosBjorn Helgaas
Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2023-01-13x86/perf/amd: Remove tracing from perf_lopwr_cb()Peter Zijlstra
The perf_lopwr_cb() function is called from the idle routines; there is no RCU there, we must not enter tracing. Use __always_inline, noidle annotations and existing no-trace methods. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20230112195539.392862891@infradead.org
2022-10-27perf: Rewrite core context handlingPeter Zijlstra
There have been various issues and limitations with the way perf uses (task) contexts to track events. Most notable is the single hardware PMU task context, which has resulted in a number of yucky things (both proposed and merged). Notably: - HW breakpoint PMU - ARM big.little PMU / Intel ADL PMU - Intel Branch Monitoring PMU - AMD IBS PMU - S390 cpum_cf PMU - PowerPC trace_imc PMU *Current design:* Currently we have a per task and per cpu perf_event_contexts: task_struct::perf_events_ctxp[] <-> perf_event_context <-> perf_cpu_context ^ | ^ | ^ `---------------------------------' | `--> pmu ---' v ^ perf_event ------' Each task has an array of pointers to a perf_event_context. Each perf_event_context has a direct relation to a PMU and a group of events for that PMU. The task related perf_event_context's have a pointer back to that task. Each PMU has a per-cpu pointer to a per-cpu perf_cpu_context, which includes a perf_event_context, which again has a direct relation to that PMU, and a group of events for that PMU. The perf_cpu_context also tracks which task context is currently associated with that CPU and includes a few other things like the hrtimer for rotation etc. Each perf_event is then associated with its PMU and one perf_event_context. *Proposed design:* New design proposed by this patch reduce to a single task context and a single CPU context but adds some intermediate data-structures: task_struct::perf_event_ctxp -> perf_event_context <- perf_cpu_context ^ | ^ ^ `---------------------------' | | | | perf_cpu_pmu_context <--. | `----. ^ | | | | | | v v | | ,--> perf_event_pmu_context | | | | | | | v v | perf_event ---> pmu ----------------' With the new design, perf_event_context will hold all events for all pmus in the (respective pinned/flexible) rbtrees. This can be achieved by adding pmu to rbtree key: {cpu, pmu, cgroup, group_index} Each perf_event_context carries a list of perf_event_pmu_context which is used to hold per-pmu-per-context state. For example, it keeps track of currently active events for that pmu, a pmu specific task_ctx_data, a flag to tell whether rotation is required or not etc. Additionally, perf_cpu_pmu_context is used to hold per-pmu-per-cpu state like hrtimer details to drive the event rotation, a pointer to perf_event_pmu_context of currently running task and some other ancillary information. Each perf_event is associated to it's pmu, perf_event_context and perf_event_pmu_context. Further optimizations to current implementation are possible. For example, ctx_resched() can be optimized to reschedule only single pmu events. Much thanks to Ravi for picking this up and pushing it towards completion. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221008062424.313-1-ravi.bangoria@amd.com
2022-08-27perf/x86/amd/brs: Move feature-specific functionsSandipan Das
Move some of the Branch Sampling (BRS) specific functions out of the Core events sources and into the BRS sources in preparation for adding other mechanisms to record branches. Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/b60283b57179475d18ee242d117c335c16733693.1660211399.git.sandipan.das@amd.com
2022-05-18perf/x86/amd: Run AMD BRS code only on supported hwBorislav Petkov
This fires on a Fam16h machine here: unchecked MSR access error: WRMSR to 0xc000010f (tried to write 0x0000000000000018) \ at rIP: 0xffffffff81007db1 (amd_brs_reset+0x11/0x50) Call Trace: <TASK> amd_pmu_cpu_starting ? x86_pmu_dead_cpu x86_pmu_starting_cpu cpuhp_invoke_callback ? x86_pmu_starting_cpu ? x86_pmu_dead_cpu cpuhp_issue_call ? x86_pmu_starting_cpu __cpuhp_setup_state_cpuslocked ? x86_pmu_dead_cpu ? x86_pmu_starting_cpu __cpuhp_setup_state ? map_vsyscall init_hw_perf_events ? map_vsyscall do_one_initcall ? _raw_spin_unlock_irqrestore ? try_to_wake_up kernel_init_freeable ? rest_init kernel_init ret_from_fork because that CPU hotplug callback gets executed on any AMD CPU - not only on the BRS-enabled ones. Check the BRS feature bit properly. Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-By: Stephane Eranian <eranian@google.com> Link: https://lkml.kernel.org/r/20220516154838.7044-1-bp@alien8.de
2022-04-05perf/x86/amd: Add idle hooks for branch samplingStephane Eranian
On AMD Fam19h Zen3, the branch sampling (BRS) feature must be disabled before entering low power and re-enabled (if was active) when returning from low power. Otherwise, the NMI interrupt may be held up for too long and cause problems. Stopping BRS will cause the NMI to be delivered if it was held up. Define a perf_amd_brs_lopwr_cb() callback to stop/restart BRS. The callback is protected by a jump label which is enabled only when AMD BRS is detected. In all other cases, the callback is never called. Signed-off-by: Stephane Eranian <eranian@google.com> [peterz: static_call() and build fixes] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-10-eranian@google.com
2022-04-05perf/x86/amd: Enable branch sampling priv level filteringStephane Eranian
The AMD Branch Sampling features does not provide hardware filtering by privilege level. The associated PMU counter does but not the branch sampling by itself. Given how BRS operates there is a possibility that BRS captures kernel level branches even though the event is programmed to count only at the user level. Implement a workaround in software by removing the branches which belong to the wrong privilege level. The privilege level is evaluated on the target of the branch and not the source so as to be compatible with other architectures. As a consequence of this patch, the number of entries in the PERF_RECORD_BRANCH_STACK buffer may be less than the maximum (16). It could even be zero. Another consequence is that consecutive entries in the branch stack may not reflect actual code path and may have discontinuities, in case kernel branches were suppressed. But this is no different than what happens on other architectures. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-6-eranian@google.com
2022-04-05perf/x86/amd: Add AMD Fam19h Branch Sampling supportStephane Eranian
Add support for the AMD Fam19h 16-deep branch sampling feature as described in the AMD PPR Fam19h Model 01h Revision B1. This is a model specific extension. It is not an architected AMD feature. The Branch Sampling (BRS) operates with a 16-deep saturating buffer in MSR registers. There is no branch type filtering. All control flow changes are captured. BRS relies on specific programming of the core PMU of Fam19h. In particular, the following requirements must be met: - the sampling period be greater than 16 (BRS depth) - the sampling period must use a fixed and not frequency mode BRS interacts with the NMI interrupt as well. Because enabling BRS is expensive, it is only activated after P event occurrences, where P is the desired sampling period. At P occurrences of the event, the counter overflows, the CPU catches the interrupt, activates BRS for 16 branches until it saturates, and then delivers the NMI to the kernel. Between the overflow and the time BRS activates more branches may be executed skewing the period. All along, the sampling event keeps counting. The skid may be attenuated by reducing the sampling period by 16 (subsequent patch). BRS is integrated into perf_events seamlessly via the same PERF_RECORD_BRANCH_STACK sample format. BRS generates perf_branch_entry records in the sampling buffer. No prediction information is supported. The branches are stored in reverse order of execution. The most recent branch is the first entry in each record. No modification to the perf tool is necessary. BRS can be used with any sampling event. However, it is recommended to use the RETIRED_BRANCH_INSTRUCTIONS event because it matches what the BRS captures. $ perf record -b -c 1000037 -e cpu/event=0xc2,name=ret_br_instructions/ test $ perf report -D 56531696056126 0x193c000 [0x1a8]: PERF_RECORD_SAMPLE(IP, 0x2): 18122/18230: 0x401d24 period: 1000037 addr: 0 ... branch stack: nr:16 ..... 0: 0000000000401d24 -> 0000000000401d5a 0 cycles 0 ..... 1: 0000000000401d5c -> 0000000000401d24 0 cycles 0 ..... 2: 0000000000401d22 -> 0000000000401d5c 0 cycles 0 ..... 3: 0000000000401d5e -> 0000000000401d22 0 cycles 0 ..... 4: 0000000000401d20 -> 0000000000401d5e 0 cycles 0 ..... 5: 0000000000401d3e -> 0000000000401d20 0 cycles 0 ..... 6: 0000000000401d42 -> 0000000000401d3e 0 cycles 0 ..... 7: 0000000000401d3c -> 0000000000401d42 0 cycles 0 ..... 8: 0000000000401d44 -> 0000000000401d3c 0 cycles 0 ..... 9: 0000000000401d3a -> 0000000000401d44 0 cycles 0 ..... 10: 0000000000401d46 -> 0000000000401d3a 0 cycles 0 ..... 11: 0000000000401d38 -> 0000000000401d46 0 cycles 0 ..... 12: 0000000000401d48 -> 0000000000401d38 0 cycles 0 ..... 13: 0000000000401d36 -> 0000000000401d48 0 cycles 0 ..... 14: 0000000000401d4a -> 0000000000401d36 0 cycles 0 ..... 15: 0000000000401d34 -> 0000000000401d4a 0 cycles 0 ... thread: test:18230 ...... dso: test Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220322221517.2510440-4-eranian@google.com