summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2024-11-06x86/percpu: fix clang warning when dealing with unsigned typesAndy Shevchenko
Patch series "percpu: Add a test case and fix for clang", v2. Add a test case to percpu to check a corner case with the specific 64-bit unsigned value. This test case shows why the first patch is done in the way it's done. The before and after has been tested with binary comparison of the percpu_test module and runnig it on the real Intel system. This patch (of 2): When percpu_add_op() is used with an unsigned argument, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: net/ipv4/tcp_output.c:187:3: error: result of comparison of constant -1 with expression of type 'u8' (aka 'unsigned char') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 187 | NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPACKCOMPRESSED, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 188 | tp->compressed_ack); | ~~~~~~~~~~~~~~~~~~~ ... arch/x86/include/asm/percpu.h:238:31: note: expanded from macro 'percpu_add_op' 238 | ((val) == 1 || (val) == -1)) ? \ | ~~~~~ ^ ~~ Fix this by casting -1 to the type of the parameter and then compare. Link: https://lkml.kernel.org/r/20241016182635.1156168-1-andriy.shevchenko@linux.intel.com Link: https://lkml.kernel.org/r/20241016182635.1156168-2-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Christoph Lameter <cl@linux.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: remove unused hugepage for vma_alloc_folio()Kefeng Wang
The hugepage parameter was deprecated since commit ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"), for PMD-sized THP, it still tries only preferred node if possible in vma_alloc_folio() by checking the order of the folio allocation. Link: https://lkml.kernel.org/r/20241010061556.1846751-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Barry Song <baohua@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06kaslr: rename physmem_end and PHYSMEM_END to direct_map_physmem_endJohn Hubbard
For clarity. It's increasingly hard to reason about the code, when KASLR is moving around the boundaries. In this case where KASLR is randomizing the location of the kernel image within physical memory, the maximum number of address bits for physical memory has not changed. What has changed is the ending address of memory that is allowed to be directly mapped by the kernel. Let's name the variable, and the associated macro accordingly. Also, enhance the comment above the direct_map_physmem_end definition, to further clarify how this all works. Link: https://lkml.kernel.org/r/20241009025024.89813-1-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Will Deacon <will@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alistair Popple <apopple@nvidia.com> Cc: Jordan Niethe <jniethe@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: drop hugetlb_get_unmapped_area{_*} functionsOscar Salvador
Hugetlb mappings are now handled through normal channels just like any other mapping, so we no longer need hugetlb_get_unmapped_area* specific functions. Link: https://lkml.kernel.org/r/20241007075037.267650-8-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06arch/x86: teach arch_get_unmapped_area_vmflags to handle hugetlb mappingsOscar Salvador
We want to stop special casing hugetlb mappings and make them go through generic channels, so teach arch_get_unmapped_area_{topdown_}vmflags to handle those. x86 specific hugetlb function does not set either info.start_gap or info.align_offset so the same here for compatibility. Link: https://lkml.kernel.org/r/20241007075037.267650-4-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Xu <peterx@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06mm: move set_pxd_safe() helpers from generic to platformAnshuman Khandual
set_pxd_safe() helpers that serve a specific purpose for both x86 and riscv platforms, do not need to be in the common memory code. Otherwise they just unnecessarily make the common API more complicated. This moves the helpers from common code to platform instead. Link: https://lkml.kernel.org/r/20241003044842.246016-1-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Hildenbrand <david@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07KVM: x86/xen: Initialize hrtimer in kvm_xen_init_vcpu()Nam Cao
The hrtimer is initialized in the KVM_XEN_VCPU_SET_ATTR ioctl. That caused problem in the past, because the hrtimer can be initialized multiple times, which was fixed by commit af735db31285 ("KVM: x86/xen: Initialize Xen timer only once"). This commit avoids initializing the timer multiple times by checking the field 'function' of struct hrtimer to determine if it has already been initialized. This is not required and in the way to make the function field private. Move the hrtimer initialization into kvm_xen_init_vcpu() so that it will only be initialized once. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/9c33c7224d97d08f4fa30d3cc8687981c1d3e953.1730386209.git.namcao@linutronix.de
2024-11-06PCI: Detect and trust built-in Thunderbolt chipsEsther Shimanovich
Some computers with CPUs that lack Thunderbolt features use discrete Thunderbolt chips to add Thunderbolt functionality. These Thunderbolt chips are located within the chassis; between the Root Port labeled ExternalFacingPort and the USB-C port. These Thunderbolt PCIe devices should be labeled as fixed and trusted, as they are built into the computer. Otherwise, security policies that rely on those flags may have unintended results, such as preventing USB-C ports from enumerating. Detect the above scenario through the process of elimination. 1) Integrated Thunderbolt host controllers already have Thunderbolt implemented, so anything outside their external facing Root Port is removable and untrusted. Detect them using the following properties: - Most integrated host controllers have the "usb4-host-interface" ACPI property, as described here: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#mapping-native-protocols-pcie-displayport-tunneled-through-usb4-to-usb4-host-routers - Integrated Thunderbolt PCIe Root Ports before Alder Lake do not have the "usb4-host-interface" ACPI property. Identify those by their PCI IDs instead. 2) If a Root Port does not have integrated Thunderbolt capabilities, but has the "ExternalFacingPort" ACPI property, that means the manufacturer has opted to use a discrete Thunderbolt host controller that is built into the computer. This host controller can be identified by virtue of being located directly below an external-facing Root Port that lacks integrated Thunderbolt. Label it as trusted and fixed. Everything downstream from it is untrusted and removable. The "ExternalFacingPort" ACPI property is described here: https://learn.microsoft.com/en-us/windows-hardware/drivers/pci/dsd-for-pcie-root-ports#identifying-externally-exposed-pcie-root-ports Link: https://lore.kernel.org/r/20240910-trust-tbt-fix-v5-1-7a7a42a5f496@chromium.org Suggested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Esther Shimanovich <eshimanovich@chromium.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
2024-11-06ACPI: processor: Move arch_init_invariance_cppc() call laterMario Limonciello
arch_init_invariance_cppc() is called at the end of acpi_cppc_processor_probe() in order to configure frequency invariance based upon the values from _CPC. This however doesn't work on AMD CPPC shared memory designs that have AMD preferred cores enabled because _CPC needs to be analyzed from all cores to judge if preferred cores are enabled. This issue manifests to users as a warning since commit 21fb59ab4b97 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"): ``` Could not retrieve highest performance (-19) ``` However the warning isn't the cause of this, it was actually commit 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") which exposed the issue. To fix this problem, change arch_init_invariance_cppc() into a new weak symbol that is called at the end of acpi_processor_driver_init(). Each architecture that supports it can declare the symbol to override the weak one. Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the architectures using the generic arch_topology.c code. Fixes: 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-06fs/xattr: add *at family syscallsChristian Göttsche
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_* and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-07x86/kprobes: Cleanup kprobes on ftrace codeMasami Hiramatsu (Google)
Cleanup kprobes on ftrace code for x86. - Set instruction pointer (ip + MCOUNT_INSN_SIZE) after pre_handler only when p->post_handler exists. - Use INT3_INSN_SIZE instead of 1. - Use instruction_pointer/instruction_pointer_set() functions instead of accessing regs->ip directly. Link: https://lore.kernel.org/all/172951436219.167263.18330240454389154327.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-11-06Merge tag 'perf-core-for-bpf-next' from tip treeAndrii Nakryiko
Stable tag for bpf-next's uprobe work. Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-06kbuild: Add AutoFDO support for Clang buildRong Xu
Add the build support for using Clang's AutoFDO. Building the kernel with AutoFDO does not reduce the optimization level from the compiler. AutoFDO uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. Experiments showed that the kernel can improve up to 10% in latency. The support requires a Clang compiler after LLVM 17. This submission is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, and BRBE on ARM 1 is part of planned future work. Here is an example workflow for AutoFDO kernel: 1) Build the kernel on the host machine with LLVM enabled, for example, $ make menuconfig LLVM=1 Turn on AutoFDO build config: CONFIG_AUTOFDO_CLANG=y With a configuration that has LLVM enabled, use the following command: scripts/config -e AUTOFDO_CLANG After getting the config, build with $ make LLVM=1 2) Install the kernel on the test machine. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 For Zen3: $ cat proc/cpuinfo | grep " brs" For Zen4: $ cat proc/cpuinfo | grep amd_lbr_v2 $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the host machine. 5) To generate an AutoFDO profile, two offline tools are available: create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part of the AutoFDO project and can be found on GitHub (https://github.com/google/autofdo), version v0.30.1 or later. The llvm_profgen tool is included in the LLVM compiler itself. It's important to note that the version of llvm_profgen doesn't need to match the version of Clang. It needs to be the LLVM 19 release or later, or from the LLVM trunk. $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \ -o <profile_file> or $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=extbinary --out=<profile_file> Note that multiple AutoFDO profile files can be merged into one via: $ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n> 6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> Co-developed-by: Han Shen <shenhan@google.com> Signed-off-by: Han Shen <shenhan@google.com> Signed-off-by: Rong Xu <xur@google.com> Suggested-by: Sriraman Tallam <tmsriram@google.com> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Suggested-by: Stephane Eranian <eranian@google.com> Tested-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: Yabin Cui <yabinc@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <kees@kernel.org> Tested-by: Peter Jung <ptr1337@cachyos.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2024-11-06x86/resctrl: Support Sub-NUMA cluster mode SNC6Tony Luck
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some Intel platforms. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
2024-11-06Merge tag 'drm-intel-next-2024-11-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull #2 for v6.13: Features and functionality: - Pantherlake (PTL) Xe3 LPD display enabling for xe driver (Clint, Suraj, Dnyaneshwar, Matt, Gustavo, Radhakrishna, Chaitanya, Haridhar, Juha-Pekka, Ravi) - Enable dbuf overlap detection on Lunarlake and later (Stanislav, Vinod) - Allow fastset for HDR infoframe changes (Chaitanya) - Write DP source OUI also for non-eDP sinks (Imre) Refactoring and cleanups: - Independent platform identification for display (Jani) - Display tracepoint fixes and cleanups (Gustavo) - Share PCI ID headers between i915 and xe drivers (Jani) - Use x100 version for full version and release checks (Jani) - Conversions to struct intel_display (Jani, Ville) - Reuse DP DPCD and AUX macros in gvt instead of duplication (Jani) - Use string choice helpers (R Sundar, Sai Teja) - Remove unused underrun detection irq code (Sai Teja) - Color management debug improvements and other cleanups (Ville) - Refactor panel fitter code to a separate file (Ville) - Use try_cmpxchg() instead of open-coding (Uros Bizjak) Fixes: - PSR and Panel Replay fixes and workarounds (Jouni) - Fix panel power during connector detection (Imre) - Fix connector detection and modeset races (Imre) - Fix C20 PHY TX MISC configuration (Gustavo) - Improve panel fitter validity checks (Ville) - Fix eDP short HPD interrupt handling while runtime suspended (Imre) - Propagate DP MST DSC BW overhead/slice calculation errors (Imre) - Stop hotplug polling for eDP connectors (Imre) - Workaround panels reporting bad link status after PSR enable (Jouni) - Panel Replay VRR VSC SDP related workaround and refactor (Animesh, Mitul) - Fix memory leak on eDP init error path (Shuicheng) - Fix GVT KVMGT Kconfig dependencies (Arnd Bergmann) - Fix irq function documentation build warning (Rodrigo) - Add platform check to power management fuse bit read (Clint) - Revert kstrdup_const() and kfree_const() usage for clarity (Christophe JAILLET) - Workaround horizontal odd panning issues in display versions 20 and 30 (Nemesa) - Fix xe drive HDCP GSC firmware check (Suraj) Merges: - Backmerge drm-next to get some KVM changes (Rodrigo) - Fix a build failure originating from previous backmerge (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/display/intel_dp_mst.c From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87h68ni0wd.fsf@intel.com
2024-11-05x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 clientMario Limonciello
A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot. These instructions aren't intended to be advertised on Zen4 client so clear the capability. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009
2024-11-05perf/x86/intel: Do not enable large PEBS for events with aux actions or aux ↵Adrian Hunter
sampling Events with aux actions or aux sampling expect the PMI to coincide with the event, which does not happen for large PEBS, so do not enable large PEBS in that case. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20241022155920.17511-5-adrian.hunter@intel.com
2024-11-05perf/x86/intel/pt: Add support for pause / resumeAdrian Hunter
Prevent tracing to start if aux_paused. Implement support for PERF_EF_PAUSE / PERF_EF_RESUME. When aux_paused, stop tracing. When not aux_paused, only start tracing if it isn't currently meant to be stopped. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Andi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/20241022155920.17511-4-adrian.hunter@intel.com
2024-11-05perf/x86/intel/pt: Fix buffer full but size is 0 caseAdrian Hunter
If the trace data buffer becomes full, a truncated flag [T] is reported in PERF_RECORD_AUX. In some cases, the size reported is 0, even though data must have been added to make the buffer full. That happens when the buffer fills up from empty to full before the Intel PT driver has updated the buffer position. Then the driver calculates the new buffer position before calculating the data size. If the old and new positions are the same, the data size is reported as 0, even though it is really the whole buffer size. Fix by detecting when the buffer position is wrapped, and adjust the data size calculation accordingly. Example Use a very small buffer size (8K) and observe the size of truncated [T] data. Before the fix, it is possible to see records of 0 size. Before: $ perf record -m,8K -e intel_pt// uname Linux [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data ] $ perf script -D --no-itrace | grep AUX | grep -F '[T]' Warning: AUX data lost 2 times out of 3! 5 19462712368111 0x19710 [0x40]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T] 5 19462712700046 0x19ba8 [0x40]: PERF_RECORD_AUX offset: 0x170 size: 0xe90 flags: 0x1 [T] After: $ perf record -m,8K -e intel_pt// uname Linux [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.040 MB perf.data ] $ perf script -D --no-itrace | grep AUX | grep -F '[T]' Warning: AUX data lost 2 times out of 3! 1 113720802995 0x4948 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x2000 flags: 0x1 [T] 1 113720979812 0x6b10 [0x40]: PERF_RECORD_AUX offset: 0x2000 size: 0x2000 flags: 0x1 [T] Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20241022155920.17511-2-adrian.hunter@intel.com
2024-11-05sched, x86: Enable Lazy preemptionPeter Zijlstra
Add the TIF bit and select the Kconfig symbol to make it go. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lkml.kernel.org/r/20241007075055.555778919@infradead.org
2024-11-05seqlock, treewide: Switch to non-raw seqcount_latch interfaceMarco Elver
Switch all instrumentable users of the seqcount_latch interface over to the non-raw interface. Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
2024-11-05locking/atomic/x86: Use ALT_OUTPUT_SP() for __arch_{,try_}cmpxchg64_emu()Uros Bizjak
x86_32 __arch_{,try_}cmpxchg64_emu()() macros use CALL instruction inside asm statement. Use ALT_OUTPUT_SP() macro to add required dependence on %esp register. Fixes: 79e1dd05d1a2 ("x86: Provide an alternative() based cmpxchg64()") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241103160954.3329-2-ubizjak@gmail.com
2024-11-05locking/atomic/x86: Use ALT_OUTPUT_SP() for __alternative_atomic64()Uros Bizjak
CONFIG_X86_CMPXCHG64 variant of x86_32 __alternative_atomic64() macro uses CALL instruction inside asm statement. Use ALT_OUTPUT_SP() macro to add required dependence on %esp register. Fixes: 819165fb34b9 ("x86: Adjust asm constraints in atomic64 wrappers") Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20241103160954.3329-1-ubizjak@gmail.com
2024-11-04KVM: SVM: Propagate error from snp_guest_req_init() to userspaceSean Christopherson
If snp_guest_req_init() fails, return the provided error code up the stack to userspace, e.g. so that userspace can log that KVM_SEV_INIT2 failed, as opposed to some random operation later in VM setup failing because SNP wasn't actually enabled for the VM. Note, KVM itself doesn't consult the return value from __sev_guest_init(), i.e. the fallout is purely that userspace may be confused. Fixes: 88caf544c930 ("KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202410192220.MeTyHPxI-lkp@intel.com Link: https://lore.kernel.org/r/20241031203214.1585751-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabledSean Christopherson
When getting the current VPID, e.g. to emulate a guest TLB flush, return vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled in vmcs12. Architecturally, if VPID is disabled, then the guest and host effectively share VPID=0. KVM emulates this behavior by using vpid01 when running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so KVM must also treat vpid01 as the current VPID while L2 is active. Unconditionally treating vpid02 as the current VPID when L2 is active causes KVM to flush TLB entries for vpid02 instead of vpid01, which results in TLB entries from L1 being incorrectly preserved across nested VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after nested VM-Exit flushes vpid01). The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM incorrectly retains TLB entries for the APIC-access page across a nested VM-Enter. Opportunisticaly add comments at various touchpoints to explain the architectural requirements, and also why KVM uses vpid01 instead of vpid02. All credit goes to Chao, who root caused the issue and identified the fix. Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST") Cc: stable@vger.kernel.org Cc: Like Xu <like.xu.linux@gmail.com> Debugged-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Short-circuit all of kvm_apic_set_base() if MSR value is unchangedSean Christopherson
Do nothing in all of kvm_apic_set_base(), not just __kvm_apic_set_base(), if the incoming MSR value is the same as the current value. Validating the mode transitions is obviously unnecessary, and rejecting the write is pointless if the vCPU already has an invalid value, e.g. if userspace is doing weird things and modified guest CPUID after setting MSR_IA32_APICBASE. Bailing early avoids kvm_recalculate_apic_map()'s slow path in the rare scenario where the map is DIRTY due to some other vCPU dirtying the map, in which case it's the other vCPU/task's responsibility to recalculate the map. Note, kvm_lapic_reset() calls __kvm_apic_set_base() only when emulating RESET, in which case the old value is guaranteed to be zero, and the new value is guaranteed to be non-zero. I.e. all callers of __kvm_apic_set_base() effectively pre-check for the MSR value actually changing. Don't bother keeping the check in __kvm_apic_set_base(), as no additional callers are expected, and implying that the MSR might already be non-zero at the time of kvm_lapic_reset() could confuse readers. Link: https://lore.kernel.org/r/20241101183555.1794700-10-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Unpack msr_data structure prior to calling kvm_apic_set_base()Sean Christopherson
Pass in the new value and "host initiated" as separate parameters to kvm_apic_set_base(), as forcing the KVM_SET_SREGS path to declare and fill an msr_data structure is awkward and kludgy, e.g. __set_sregs_common() doesn't even bother to set the proper MSR index. No functional change intended. Suggested-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241101183555.1794700-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Make kvm_recalculate_apic_map() local to lapic.cSean Christopherson
Make kvm_recalculate_apic_map() local to lapic.c now that all external callers are gone. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-8-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Rename APIC base setters to better capture their relationshipSean Christopherson
Rename kvm_set_apic_base() and kvm_lapic_set_base() to kvm_apic_set_base() and __kvm_apic_set_base() respectively to capture that the underscores version is a "special" variant (it exists purely to avoid recalculating the optimized map multiple times when stuffing the RESET value). Opportunistically add a comment explaining why kvm_lapic_reset() uses the inner helper. Note, KVM deliberately invokes kvm_arch_vcpu_create() while kvm->lock is NOT held so that vCPU setup isn't serialized if userspace is creating multiple/all vCPUs in parallel. I.e. triggering an extra recalculation is not limited to theoretical/rare edge cases, and so is worth avoiding. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-7-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Move kvm_set_apic_base() implementation to lapic.c (from x86.c)Sean Christopherson
Move kvm_set_apic_base() to lapic.c so that the bulk of KVM's local APIC code resides in lapic.c, regardless of whether or not KVM is emulating the local APIC in-kernel. This will also allow making various helpers visible only to lapic.c. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-6-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Inline kvm_get_apic_mode() in lapic.hSean Christopherson
Inline kvm_get_apic_mode() in lapic.h to avoid a CALL+RET as well as an export. The underlying kvm_apic_mode() helper is public information, i.e. there is no state/information that needs to be hidden from vendor modules. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-5-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Get vcpu->arch.apic_base directly and drop kvm_get_apic_base()Sean Christopherson
Access KVM's emulated APIC base MSR value directly instead of bouncing through a helper, as there is no reason to add a layer of indirection, and there are other MSRs with a "set" but no "get", e.g. EFER. No functional change intended. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-4-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Drop superfluous kvm_lapic_set_base() call when setting APIC stateSean Christopherson
Now that kvm_lapic_set_base() does nothing if the "new" APIC base MSR is the same as the current value, drop the kvm_lapic_set_base() call in the KVM_SET_LAPIC flow that passes in the current value, as it too does nothing. Note, the purpose of invoking kvm_lapic_set_base() was purely to set apic->base_address (see commit 5dbc8f3fed0b ("KVM: use kvm_lapic_set_base() to change apic_base")). And there is no evidence that explicitly setting apic->base_address in KVM_SET_LAPIC ever had any functional impact; even in the original commit 96ad2cc61324 ("KVM: in-kernel LAPIC save and restore support"), all flows that set apic_base also set apic->base_address to the same address. E.g. svm_create_vcpu() did open code a write to apic_base, svm->vcpu.apic_base = 0xfee00000 | MSR_IA32_APICBASE_ENABLE; but it also called kvm_create_lapic() when irqchip_in_kernel() is true. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-3-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86: Short-circuit all kvm_lapic_set_base() if MSR value isn't changingSean Christopherson
Do nothing in kvm_lapic_set_base() if the APIC base MSR value is the same as the current value. All flows except the handling of the base address explicitly take effect if and only if relevant bits are changing. For the base address, invoking kvm_lapic_set_base() before KVM initializes the base to APIC_DEFAULT_PHYS_BASE during vCPU RESET would be a KVM bug, i.e. KVM _must_ initialize apic->base_address before exposing the vCPU (to userspace or KVM at-large). Note, the inhibit is intended to be set if the base address is _changed_ from the default, i.e. is also covered by the RESET behavior. Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241009181742.1128779-2-seanjc@google.com Link: https://lore.kernel.org/r/20241101183555.1794700-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Drop per-VM zapped_obsolete_pages listVipin Sharma
Drop the per-VM zapped_obsolete_pages list now that the usage from the defunct mmu_shrinker is gone, and instead use a local list to track pages in kvm_zap_obsolete_pages(), the sole remaining user of zapped_obsolete_pages. Opportunistically add an assertion to verify and document that slots_lock must be held, i.e. that there can only be one active instance of kvm_zap_obsolete_pages() at any given time, and by doing so also prove that using a local list instead of a per-VM list doesn't change any functionality (beyond trivialities like list initialization). Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20241101201437.1604321-2-vipinsh@google.com [sean: split to separate patch, write changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Remove KVM's MMU shrinkerVipin Sharma
Remove KVM's MMU shrinker and (almost) all of its related code, as the current implementation is very disruptive to VMs (if it ever runs), without providing any meaningful benefit[1]. Alternatively, KVM could repurpose its shrinker, e.g. to reclaim pages from the per-vCPU caches[2], but given that no one has complained about lack of TDP MMU support for the shrinker in the 3+ years since the TDP MMU was enabled by default, it's safe to say that there is likely no real use case for initiating reclaim of KVM's page tables from the shrinker. And while clever/cute, reclaiming the per-vCPU caches doesn't scale the same way that reclaiming in-use page table pages does. E.g. the amount of memory being used by a VM doesn't always directly correlate with the number vCPUs, and even when it does, reclaiming a few pages from per-vCPU caches likely won't make much of a dent in the VM's total memory usage, especially for VMs with huge amounts of memory. Lastly, if it turns out that there is a strong use case for dropping the per-vCPU caches, re-introducing the shrinker registration is trivial compared to the complexity of actually reclaiming pages from the caches. [1] https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com [2] https://lore.kernel.org/kvm/20241004195540.210396-3-vipinsh@google.com Suggested-by: Sean Christopherson <seanjc@google.com> Suggested-by: David Matlack <dmatlack@google.com> Signed-off-by: Vipin Sharma <vipinsh@google.com> Link: https://lore.kernel.org/r/20241101201437.1604321-2-vipinsh@google.com [sean: keep zapped_obsolete_pages for now, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: WARN if huge page recovery triggered during dirty loggingDavid Matlack
WARN and bail out of recover_huge_pages_range() if dirty logging is enabled. KVM shouldn't be recovering huge pages during dirty logging anyway, since KVM needs to track writes at 4KiB. However it's not out of the possibility that that changes in the future. If KVM wants to recover huge pages during dirty logging, make_huge_spte() must be updated to write-protect the new huge page mapping. Otherwise, writes through the newly recovered huge page mapping will not be tracked. Note that this potential risk did not exist back when KVM zapped to recover huge page mappings, since subsequent accesses would just be faulted in at PG_LEVEL_4K if dirty logging was enabled. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-7-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Rename make_huge_page_split_spte() to make_small_spte()David Matlack
Rename make_huge_page_split_spte() to make_small_spte(). This ensures that the usage of "small_spte" and "huge_spte" are consistent between make_huge_spte() and make_small_spte(). This should also reduce some confusion as make_huge_page_split_spte() almost reads like it will create a huge SPTE, when in fact it is creating a small SPTE to split the huge SPTE. No functional change intended. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-6-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Recover TDP MMU huge page mappings in-place instead of zappingDavid Matlack
Recover TDP MMU huge page mappings in-place instead of zapping them when dirty logging is disabled, and rename functions that recover huge page mappings when dirty logging is disabled to move away from the "zap collapsible spte" terminology. Before KVM flushes TLBs, guest accesses may be translated through either the (stale) small SPTE or the (new) huge SPTE. This is already possible when KVM is doing eager page splitting (where TLB flushes are also batched), and when vCPUs are faulting in huge mappings (where TLBs are flushed after the new huge SPTE is installed). Recovering huge pages reduces the number of page faults when dirty logging is disabled: $ perf stat -e kvm:kvm_page_fault -- ./dirty_log_perf_test -s anonymous_hugetlb_2mb -v 64 -e -b 4g Before: 393,599 kvm:kvm_page_fault After: 262,575 kvm:kvm_page_fault vCPU throughput and the latency of disabling dirty-logging are about equal compared to zapping, but avoiding faults can be beneficial to remove vCPU jitter in extreme scenarios. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-5-dmatlack@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Refactor TDP MMU iter need resched checkSean Christopherson
Refactor the TDP MMU iterator "need resched" checks into a helper function so they can be called from a different code path in a subsequent commit. No functional change intended. Signed-off-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20240823235648.3236880-4-dmatlack@google.com [sean: rebase on a swapped order of checks] Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Demote the WARN on yielded in xxx_cond_resched() to ↵Sean Christopherson
KVM_MMU_WARN_ON Convert the WARN in tdp_mmu_iter_cond_resched() that the iterator hasn't already yielded to a KVM_MMU_WARN_ON() so the code is compiled out for production kernels (assuming production kernels disable KVM_PROVE_MMU). Checking for a needed reschedule is a hot path, and KVM sanity checks iter->yielded in several other less-hot paths, i.e. the odds of KVM not flagging that something went sideways are quite low. Furthermore, the odds of KVM not noticing *and* the WARN detecting something worth investigating are even lower. Link: https://lore.kernel.org/r/20241031170633.1502783-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04KVM: x86/mmu: Check yielded_gfn for forward progress iff resched is neededSean Christopherson
Swap the order of the checks in tdp_mmu_iter_cond_resched() so that KVM checks to see if a resched is needed _before_ checking to see if yielding must be disallowed to guarantee forward progress. Iterating over TDP MMU SPTEs is a hot path, e.g. tearing down a root can touch millions of SPTEs, and not needing to reschedule is by far the common case. On the other hand, disallowing yielding because forward progress has not been made is a very rare case. Returning early for the common case (no resched), effectively reduces the number of checks from 2 to 1 for the common case, and should make the code slightly more predictable for the CPU. To resolve a weird conundrum where the forward progress check currently returns false, but the need resched check subtly returns iter->yielded, which _should_ be false (enforced by a WARN), return false unconditionally (which might also help make the sequence more predictable). If KVM has a bug where iter->yielded is left danging, continuing to yield is neither right nor wrong, it was simply an artifact of how the original code was written. Unconditionally returning false when yielding is unnecessary or unwanted will also allow extracting the "should resched" logic to a separate helper in a future patch. Cc: David Matlack <dmatlack@google.com> Reviewed-by: James Houghton <jthoughton@google.com> Link: https://lore.kernel.org/r/20241031170633.1502783-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-11-04jump_label: adjust inline asm to be consistentAlice Ryhl
To avoid duplication of inline asm between C and Rust, we need to import the inline asm from the relevant `jump_label.h` header into Rust. To make that easier, this patch updates the header files to expose the inline asm via a new ARCH_STATIC_BRANCH_ASM macro. The header files are all updated to define a ARCH_STATIC_BRANCH_ASM that takes the same arguments in a consistent order so that Rust can use the same logic for every architecture. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Jason Baron <jbaron@akamai.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Gary Guo <gary@garyguo.net> Cc: " =?utf-8?q?Bj=C3=B6rn_Roy_Baron?= " <bjorn3_gh@protonmail.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: Andreas Hindborg <a.hindborg@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anup Patel <apatel@ventanamicro.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Conor Dooley <conor.dooley@microchip.com> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tianrui Zhao <zhaotianrui@loongson.cn> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/20241030-tracepoint-v12-4-eec7f0f8ad22@google.com Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Co-developed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V Signed-off-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-03Merge tag 'x86-urgent-2024-11-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Thomas Gleixner: "A trivial compile test fix for x86: When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver fails due to a missing inline stub. Add the stub to cure it" * tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
2024-11-03fdget(), trivial conversionsAl Viro
fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-02x86/vdso: Add missing brackets in switch caseThomas Gleixner
0-day reported: arch/x86/entry/vdso/vma.c:199:3: warning: label followed by a declaration is a C23 extension [-Wc23-extensions] Add the missing brackets. Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Closes: https://lore.kernel.org/oe-kbuild-all/202411022359.fBPFTg2T-lkp@intel.com/
2024-11-02x86/vdso: Split virtual clock pages into dedicated mappingThomas Weißschuh
The generic vdso data storage cannot handle the special pvclock and hvclock pages. Split them into their own mapping, so the other vdso storage can be migrated to the generic code. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-20-b64f0842d512@linutronix.de
2024-11-02x86/vdso: Delete vvar.hThomas Weißschuh
All users have been removed. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-19-b64f0842d512@linutronix.de
2024-11-02x86/vdso: Access vdso data without vvar.hThomas Weißschuh
The vdso_data is at the start of the vvar page. Make use of this invariant to remove the usage of vvar.h. This also matches the logic for the timens data. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-18-b64f0842d512@linutronix.de
2024-11-02x86/vdso: Move the rng offset to vsyscall.hThomas Weißschuh
vvar.h will go away, so move the last useful bit into vsyscall.h. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-17-b64f0842d512@linutronix.de