summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-05-20KVM: x86: hyper-v: fix type of valid_bank_maskYury Norov
In kvm_hv_flush_tlb(), valid_bank_mask is declared as unsigned long, but is used as u64, which is wrong for i386, and has been spotted by LKP after applying "KVM: x86: hyper-v: replace bitmap_weight() with hweight64()" https://lore.kernel.org/lkml/20220510154750.212913-12-yury.norov@gmail.com/ But it's wrong even without that patch because now bitmap_weight() dereferences a word after valid_bank_mask on i386. >> include/asm-generic/bitops/const_hweight.h:21:76: warning: right shift count >= width of type +[-Wshift-count-overflow] 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~ include/asm-generic/bitops/const_hweight.h:10:16: note: in definition of macro '__const_hweight8' 10 | ((!!((w) & (1ULL << 0))) + \ | ^ include/asm-generic/bitops/const_hweight.h:20:31: note: in expansion of macro '__const_hweight16' 20 | #define __const_hweight32(w) (__const_hweight16(w) + __const_hweight16((w) >> 16)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:21:54: note: in expansion of macro '__const_hweight32' 21 | #define __const_hweight64(w) (__const_hweight32(w) + __const_hweight32((w) >> 32)) | ^~~~~~~~~~~~~~~~~ include/asm-generic/bitops/const_hweight.h:29:49: note: in expansion of macro '__const_hweight64' 29 | #define hweight64(w) (__builtin_constant_p(w) ? __const_hweight64(w) : __arch_hweight64(w)) | ^~~~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1983:36: note: in expansion of macro 'hweight64' 1983 | if (hc->var_cnt != hweight64(valid_bank_mask)) | ^~~~~~~~~ CC: Borislav Petkov <bp@alien8.de> CC: Dave Hansen <dave.hansen@linux.intel.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Ingo Molnar <mingo@redhat.com> CC: Jim Mattson <jmattson@google.com> CC: Joerg Roedel <joro@8bytes.org> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Sean Christopherson <seanjc@google.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: Wanpeng Li <wanpengli@tencent.com> CC: kvm@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: x86@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> Message-Id: <20220519171504.1238724-1-yury.norov@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20KVM: Free new dirty bitmap if creating a new memslot failsSean Christopherson
Fix a goof in kvm_prepare_memory_region() where KVM fails to free the new memslot's dirty bitmap during a CREATE action if kvm_arch_prepare_memory_region() fails. The logic is supposed to detect if the bitmap was allocated and thus needs to be freed, versus if the bitmap was inherited from the old memslot and thus needs to be kept. If there is no old memslot, then obviously the bitmap can't have been inherited The bug was exposed by commit 86931ff7207b ("KVM: x86/mmu: Do not create SPTEs for GFNs that exceed host.MAXPHYADDR"), which made it trivally easy for syzkaller to trigger failure during kvm_arch_prepare_memory_region(), but the bug can be hit other ways too, e.g. due to -ENOMEM when allocating x86's memslot metadata. The backtrace from kmemleak: __vmalloc_node_range+0xb40/0xbd0 mm/vmalloc.c:3195 __vmalloc_node mm/vmalloc.c:3232 [inline] __vmalloc+0x49/0x50 mm/vmalloc.c:3246 __vmalloc_array mm/util.c:671 [inline] __vcalloc+0x49/0x70 mm/util.c:694 kvm_alloc_dirty_bitmap virt/kvm/kvm_main.c:1319 kvm_prepare_memory_region virt/kvm/kvm_main.c:1551 kvm_set_memslot+0x1bd/0x690 virt/kvm/kvm_main.c:1782 __kvm_set_memory_region+0x689/0x750 virt/kvm/kvm_main.c:1949 kvm_set_memory_region virt/kvm/kvm_main.c:1962 kvm_vm_ioctl_set_memory_region virt/kvm/kvm_main.c:1974 kvm_vm_ioctl+0x377/0x13a0 virt/kvm/kvm_main.c:4528 vfs_ioctl fs/ioctl.c:51 __do_sys_ioctl fs/ioctl.c:870 __se_sys_ioctl fs/ioctl.c:856 __x64_sys_ioctl+0xfc/0x140 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae And the relevant sequence of KVM events: ioctl(3, KVM_CREATE_VM, 0) = 4 ioctl(4, KVM_SET_USER_MEMORY_REGION, {slot=0, flags=KVM_MEM_LOG_DIRTY_PAGES, guest_phys_addr=0x10000000000000, memory_size=4096, userspace_addr=0x20fe8000} ) = -1 EINVAL (Invalid argument) Fixes: 244893fa2859 ("KVM: Dynamically allocate "new" memslots from the get-go") Cc: stable@vger.kernel.org Reported-by: syzbot+8606b8a9cc97a63f1c87@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220518003842.1341782-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20Merge tag 'irqchip-5.19' of ↵Thomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Add new infrastructure to stop gpiolib from rewriting irq_chip structures behind our back. Convert a few of them, but this will obviously be a long effort. - A bunch of GICv3 improvements, such as using MMIO-based invalidations when possible, and reducing the amount of polling we perform when reconfiguring interrupts. - Another set of GICv3 improvements for the Pseudo-NMI functionality, with a nice cleanup making it easy to reason about the various states we can be in when an NMI fires. - The usual bunch of misc fixes and minor improvements. Link: https://lore.kernel.org/all/20220519165308.998315-1-maz@kernel.org
2022-05-20random: wire up fops->splice_{read,write}_iter()Jens Axboe
Now that random/urandom is using {read,write}_iter, we can wire it up to using the generic splice handlers. Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops") Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: added the splice_write path. Note that sendfile() and such still does not work for read, though it does for write, because of a file type restriction in splice_direct_to_actor(), which I'll address separately.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-20random: convert to using fops->write_iter()Jens Axboe
Now that the read side has been converted to fix a regression with splice, convert the write side as well to have some symmetry in the interface used (and help deprecate ->write()). Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: cleaned up random_ioctl a bit, require full writes in RNDADDENTROPY since it's crediting entropy, simplify control flow of write_pool(), and incorporate suggestions from Al.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-20random: convert to using fops->read_iter()Jens Axboe
This is a pre-requisite to wiring up splice() again for the random and urandom drivers. It also allows us to remove the INT_MAX check in getrandom(), because import_single_range() applies capping internally. Signed-off-by: Jens Axboe <axboe@kernel.dk> [Jason: rewrote get_random_bytes_user() to simplify and also incorporate additional suggestions from Al.] Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-05-20gpio: mvebu/pwm: Refuse requests with inverted polarityUwe Kleine-König
The driver doesn't take struct pwm_state::polarity into account when configuring the hardware, so refuse requests for inverted polarity. Fixes: 757642f9a584 ("gpio: mvebu: Add limited PWM support") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-20gpio: gpio-vf610: do not touch other bits when set the target bitHaibo Chen
For gpio controller contain register PDDR, when set one target bit, current logic will clear all other bits, this is wrong. Use operator '|=' to fix it. Fixes: 659d8a62311f ("gpio: vf610: add imx7ulp support") Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Haibo Chen <haibo.chen@nxp.com> Signed-off-by: Bartosz Golaszewski <brgl@bgdev.pl>
2022-05-20perf stat: Fix and validate CPU map inputs in synthetic PERF_RECORD_STAT eventsIan Rogers
Stat events can come from disk and so need a degree of validation. They contain a CPU which needs looking up via CPU map to access a counter. Add the CPU to index translation, alongside validity checking. Discussion thread: https://lore.kernel.org/linux-perf-users/CAP-5=fWQR=sCuiSMktvUtcbOLidEpUJLCybVF6=BRvORcDOq+g@mail.gmail.com/ Fixes: 7ac0089d138f80dc ("perf evsel: Pass cpu not cpu map index to synthesize") Reported-by: Michael Petlan <mpetlan@redhat.com> Suggested-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lv Ruyi <lv.ruyi@zte.com.cn> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: netdev@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Cc: Yonghong Song <yhs@fb.com> Link: http://lore.kernel.org/lkml/20220519032005.1273691-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20KVM: eventfd: Fix false positive RCU usage warningWanpeng Li
The splat below can be seen when running kvm-unit-test: ============================= WARNING: suspicious RCU usage 5.18.0-rc7 #5 Tainted: G IOE ----------------------------- /home/kernel/linux/arch/x86/kvm/../../../virt/kvm/eventfd.c:80 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by qemu-system-x86/35124: #0: ffff9725391d80b8 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x710 [kvm] #1: ffffbd25cfb2a0b8 (&kvm->srcu){....}-{0:0}, at: vcpu_enter_guest+0xdeb/0x1900 [kvm] #2: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: kvm_hv_notify_acked_sint+0x79/0x1e0 [kvm] #3: ffffbd25cfb2b920 (&kvm->irq_srcu){....}-{0:0}, at: irqfd_resampler_ack+0x5/0x110 [kvm] stack backtrace: CPU: 2 PID: 35124 Comm: qemu-system-x86 Tainted: G IOE 5.18.0-rc7 #5 Call Trace: <TASK> dump_stack_lvl+0x6c/0x9b irqfd_resampler_ack+0xfd/0x110 [kvm] kvm_notify_acked_gsi+0x32/0x90 [kvm] kvm_hv_notify_acked_sint+0xc5/0x1e0 [kvm] kvm_hv_set_msr_common+0xec1/0x1160 [kvm] kvm_set_msr_common+0x7c3/0xf60 [kvm] vmx_set_msr+0x394/0x1240 [kvm_intel] kvm_set_msr_ignored_check+0x86/0x200 [kvm] kvm_emulate_wrmsr+0x4f/0x1f0 [kvm] vmx_handle_exit+0x6fb/0x7e0 [kvm_intel] vcpu_enter_guest+0xe5a/0x1900 [kvm] kvm_arch_vcpu_ioctl_run+0x16e/0xac0 [kvm] kvm_vcpu_ioctl+0x279/0x710 [kvm] __x64_sys_ioctl+0x83/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae resampler-list is protected by irq_srcu (see kvm_irqfd_assign), so fix the false positive by using list_for_each_entry_srcu(). Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1652950153-12489-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20Merge tag 'nvme-5.19-2022-05-19' of git://git.infradead.org/nvme into ↵Jens Axboe
for-5.19/drivers Pull NVMe updates from Christoph: "nvme updates for Linux 5.19 - set non-mdts limits in nvme_scan_work (Chaitanya Kulkarni) - add support for TP4084 - Time-to-Ready Enhancements (me)" * tag 'nvme-5.19-2022-05-19' of git://git.infradead.org/nvme: nvme: set non-mdts limits in nvme_scan_work nvme: add support for TP4084 - Time-to-Ready Enhancements
2022-05-20perf build: Fix check for btf__load_from_kernel_by_id() in libbpfArnaldo Carvalho de Melo
Avi Kivity reported a problem where the __weak btf__load_from_kernel_by_id() in tools/perf/util/bpf-event.c was being used and it called btf__get_from_id() in tools/lib/bpf/btf.c that in turn called back to btf__load_from_kernel_by_id(), resulting in an endless loop. Fix this by adding a feature test to check if btf__load_from_kernel_by_id() is available when building perf with LIBBPF_DYNAMIC=1, and if not then provide the fallback to the old btf__get_from_id(), that doesn't call back to btf__load_from_kernel_by_id() since at that time it didn't exist at all. Tested on Fedora 35 where we have libbpf-devel 0.4.0 with LIBBPF_DYNAMIC where we don't have btf__load_from_kernel_by_id() and thus its feature test fail, not defining HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID: $ cat /tmp/build/perf-urgent/feature/test-libbpf-btf__load_from_kernel_by_id.make.output test-libbpf-btf__load_from_kernel_by_id.c: In function ‘main’: test-libbpf-btf__load_from_kernel_by_id.c:6:16: error: implicit declaration of function ‘btf__load_from_kernel_by_id’ [-Werror=implicit-function-declaration] 6 | return btf__load_from_kernel_by_id(20151128, NULL); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors $ $ nm /tmp/build/perf-urgent/perf | grep btf__load_from_kernel_by_id 00000000005ba180 T btf__load_from_kernel_by_id $ $ objdump --disassemble=btf__load_from_kernel_by_id -S /tmp/build/perf-urgent/perf /tmp/build/perf-urgent/perf: file format elf64-x86-64 <SNIP> 00000000005ba180 <btf__load_from_kernel_by_id>: #include "record.h" #include "util/synthetic-events.h" #ifndef HAVE_LIBBPF_BTF__LOAD_FROM_KERNEL_BY_ID struct btf *btf__load_from_kernel_by_id(__u32 id) { 5ba180: 55 push %rbp 5ba181: 48 89 e5 mov %rsp,%rbp 5ba184: 48 83 ec 10 sub $0x10,%rsp 5ba188: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax 5ba18f: 00 00 5ba191: 48 89 45 f8 mov %rax,-0x8(%rbp) 5ba195: 31 c0 xor %eax,%eax struct btf *btf; #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wdeprecated-declarations" int err = btf__get_from_id(id, &btf); 5ba197: 48 8d 75 f0 lea -0x10(%rbp),%rsi 5ba19b: e8 a0 57 e5 ff call 40f940 <btf__get_from_id@plt> 5ba1a0: 89 c2 mov %eax,%edx #pragma GCC diagnostic pop return err ? ERR_PTR(err) : btf; 5ba1a2: 48 98 cltq 5ba1a4: 85 d2 test %edx,%edx 5ba1a6: 48 0f 44 45 f0 cmove -0x10(%rbp),%rax } <SNIP> Fixes: 218e7b775d368f38 ("perf bpf: Provide a weak btf__load_from_kernel_by_id() for older libbpf versions") Reported-by: Avi Kivity <avi@scylladb.com> Link: https://lore.kernel.org/linux-perf-users/f0add43b-3de5-20c5-22c4-70aff4af959f@scylladb.com Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/linux-perf-users/YobjjFOblY4Xvwo7@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-05-20nvme: enable uring-passthrough for admin commandsKanchan Joshi
Add two new opcodes that userspace can use for admin commands: NVME_URING_CMD_ADMIN : non-vectroed NVME_URING_CMD_ADMIN_VEC : vectored variant Wire up support when these are issued on controller node(/dev/nvmeX). Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220520090630.70394-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-20nvme: helper for uring-passthrough checksKanchan Joshi
Factor out a helper consolidating the error checks, and fix typo in a comment too. This is in preparation to support admin commands on this path. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220520090630.70394-2-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-20Merge tag 'nand/for-5.19' into mtd/nextMiquel Raynal
NAND core: * Print offset instead of page number for bad blocks Raw NAND controller drivers: * Cadence: Fix possible null-ptr-deref in cadence_nand_dt_probe() * CS553X: simplify the return expression of cs553x_write_ctrl_byte() * Davinci: Remove redundant unsigned comparison to zero * Denali: Use managed device resources * GPMI: - Add large oob bch setting support - Rename the variable ecc_chunk_size - Uninline the gpmi_check_ecc function - Add strict ecc strength check - Refactor BCH geometry settings function * Intel: Fix possible null-ptr-deref in ebu_nand_probe() * MPC5121: Check before clk_disable_unprepare() not needed * Mtk: - MTD_NAND_ECC_MEDIATEK should depend on ARCH_MEDIATEK - Also parse the default nand-ecc-engine property if available - Make mtk_ecc.c a separated module * OMAP ELM: - Convert the bindings to yaml - Describe the bindings for AM64 ELM - Add support for its compatible * Renesas: Use runtime PM instead of the raw clock API and update the bindings accordingly * Rockchip: Check before clk_disable_unprepare() not needed * TMIO: Check return value after calling platform_get_resource() Raw NAND chip driver: * Kioxia: Add support for TH58NVG3S0HBAI4 and TC58NVG0S3HTA00 SPI-NAND chip drivers: * Gigadevice: - Add support for: - GD5FxGM7xExxG - GD5F{2,4}GQ5xExxG - GD5F1GQ5RExxG - GD5FxGQ4xExxG - Fix Quad IO for GD5F1GQ5UExxG * XTX: Add support for XT26G0xA Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-05-20Merge tag 'spi-nor/for-5.19' into mtd/nextMiquel Raynal
SPI NOR core changes: - Read back written SR value to make sure the write was done correctly. - Introduce a common function for Read ID that manufacturer drivers can use to verify the Octal DTR switch worked correctly. - Add helpers for read/write any register commands so manufacturer drivers don't open code it every time. - Clarify rdsr dummy cycles documentation. - Add debugfs entry to expose internal flash parameters and state. SPI NOR manufacturer drivers changes: - Add support for Winbond W25Q512NW-IM, and Eon EN25QH256A. - Move spi_nor_write_ear() to Winbond module since only Winbond flashes use it. - Rework Micron and Cypress Octal DTR enable methods to improve readability. - Use the common Read ID function to verify switch to Octal DTR mode for Micron and Cypress flashes. - Skip polling status on volatile register writes for Micron and Cypress flashes since the operation is instant. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-05-20ARM: 9204/2: module: Add all unwind tables when load moduleChen Zhongjin
For EABI stack unwinding, when loading .ko module the EXIDX sections will be added to a unwind_table list. However not all EXIDX sections are added because EXIDX sections are searched by hardcoded section names. For functions in other sections such as .ref.text or .kprobes.text, gcc generates seprated EXIDX sections (such as .ARM.exidx.ref.text or .ARM.exidx.kprobes.text). These extra EXIDX sections are not loaded, so when unwinding functions in these sections, we will failed with: unwind: Index not found xxx To fix that, I refactor the code for searching and adding EXIDX sections: - Check section type to search EXIDX tables (0x70000001) instead of strcmp() the hardcoded names. Then find the corresponding text sections by their section names. - Add a unwind_table list in module->arch to save their own unwind_table instead of the fixed-lenth array. - Save .ARM.exidx.init.text section ptr, because it should be cleaned after module init. Now all EXIDX sections of .ko can be added correctly. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9206/1: A9: Add ARM ERRATA 764319 workaround (Updated)Nick Hawkins
Enable the workaround for the 764319 Cortex A-9 erratum. CP14 read accesses to the DBGPRSR and DBGOSLSR registers generate an unexpected Undefined Instruction exception when the DBGSWENABLE external pin is set to 0, even when the CP14 accesses are performed from a privileged mode. The work around catches the exception in a way the kernel does not stop execution with the use of undef_hook. This has been found to effect the HPE GXP SoC. Signed-off-by: Nick Hawkins <nick.hawkins@hpe.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9201/1: spectre-bhb: rely on linker to emit cross-section literal loadsArd Biesheuvel
The assembler does not permit 'LDR PC, <sym>' when the symbol lives in a different section, which is why we have been relying on rather fragile open-coded arithmetic to load the address of the vector_swi routine into the program counter using a single LDR instruction in the SWI slot in the vector table. The literal was moved to a different section to in commit 19accfd373847 ("ARM: move vector stubs") to ensure that the vector stubs page does not need to be mapped readable for user space, which is the case for the vector page itself, as it carries the kuser helpers as well. So the cross-section literal load is open-coded, and this relies on the address of vector_swi to be at the very start of the vector stubs page, and we won't notice if we got it wrong until booting the kernel and see it break. Fortunately, it was guaranteed to break, so this was fragile but not problematic. Now that we have added two other variants of the vector table, we have 3 occurrences of the same trick, and so the size of our ISA/compiler/CPU validation space has tripled, in a way that may cause regressions to only be observed once booting the image in question on a CPU that exercises a particular vector table. So let's switch to true cross section references, and let the linker fix them up like it fixes up all the other cross section references in the vector page. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9200/1: spectre-bhb: avoid cross-subsection jump using a numbered labelArd Biesheuvel
In order to minimize potential confusion regarding numbered labels appearing in a different order in the assembler output due to the use of subsections, use a named local label to jump back into the vector handler code from the associated loop8 mitigation sequence. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9199/1: spectre-bhb: use local DSB and elide ISB in loop8 sequenceArd Biesheuvel
The loop8 mitigation for Spectre-BHB only requires a CPU local DSB rather than a systemwide one, which is much more costly. And by the same reasoning as why it is justified to omit the ISB after BPIALL, we can also elide the ISB and rely on the exception return for the context synchronization. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9198/1: spectre-bhb: simplify BPIALL vector macroArd Biesheuvel
The BPIALL mitigation for Spectre-BHB adds a single instruction to the handler sequence that doesn't clobber any registers. Given that these sequences are 10 instructions long, they don't fit neatly into a cacheline anyway, so we can simply move that single instruction to the start of the unmitigated one, and rearrange the symbol names accordingly. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9195/1: entry: avoid explicit literal loadsArd Biesheuvel
ARMv7 has MOVW/MOVT instruction pairs to load symbol addresses into registers without having to rely on literal loads that go via the D-cache. For older cores, we now support a similar arrangement, based on PC-relative group relocations. This means we can elide most literal loads entirely from the entry path, by switching to the ldr_va macro to emit the appropriate sequence depending on the target architecture revision. While at it, switch to the bl_r macro for invoking the right PABT/DABT helpers instead of setting the LR register explicitly, which does not play well with cores that speculate across function returns. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9194/1: assembler: simplify ldr_this_cpu for !SMP buildsArd Biesheuvel
When CONFIG_SMP is not defined, the CPU offset is always zero, and so we can simplify the sequence to load a per-CPU variable. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9192/1: amba: fix memory leak in amba_device_try_add()Wang Kefeng
If amba_device_try_add() return error code (not EPROBE_DEFER), memory leak occurred when amba device fails to read periphid. unreferenced object 0xc1c60800 (size 1024): comm "swapper/0", pid 1, jiffies 4294937333 (age 75.200s) hex dump (first 32 bytes): 40 40 db c1 04 08 c6 c1 04 08 c6 c1 00 00 00 00 @@.............. 00 d9 c1 c1 84 6f 38 c1 00 00 00 00 01 00 00 00 .....o8......... backtrace: [<(ptrval)>] kmem_cache_alloc_trace+0x168/0x2b4 [<(ptrval)>] amba_device_alloc+0x38/0x7c [<(ptrval)>] of_platform_bus_create+0x2f4/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_populate+0x70/0xc4 [<(ptrval)>] of_platform_default_populate_init+0xb4/0xcc [<(ptrval)>] do_one_initcall+0x58/0x218 [<(ptrval)>] kernel_init_freeable+0x250/0x29c [<(ptrval)>] kernel_init+0x24/0x148 [<(ptrval)>] ret_from_fork+0x14/0x1c [<00000000>] 0x0 unreferenced object 0xc1db4040 (size 64): comm "swapper/0", pid 1, jiffies 4294937333 (age 75.200s) hex dump (first 32 bytes): 31 63 30 66 30 30 30 30 2e 77 64 74 00 00 00 00 1c0f0000.wdt.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<(ptrval)>] __kmalloc_track_caller+0x19c/0x2f8 [<(ptrval)>] kvasprintf+0x60/0xcc [<(ptrval)>] kvasprintf_const+0x54/0x78 [<(ptrval)>] kobject_set_name_vargs+0x34/0xa8 [<(ptrval)>] dev_set_name+0x40/0x5c [<(ptrval)>] of_device_make_bus_id+0x128/0x1f8 [<(ptrval)>] of_platform_bus_create+0x4dc/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_bus_create+0x380/0x4e8 [<(ptrval)>] of_platform_populate+0x70/0xc4 [<(ptrval)>] of_platform_default_populate_init+0xb4/0xcc [<(ptrval)>] do_one_initcall+0x58/0x218 [<(ptrval)>] kernel_init_freeable+0x250/0x29c [<(ptrval)>] kernel_init+0x24/0x148 [<(ptrval)>] ret_from_fork+0x14/0x1c Fix them by adding amba_device_put() to release device name and amba device. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20ARM: 9193/1: amba: Add amba_read_periphid() helperWang Kefeng
Add new amba_read_periphid() helper to simplify error handling. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2022-05-20selftests: kvm/x86: Verify the pmu event filter matches the correct eventAaron Lewis
Add a test to demonstrate that when the guest programs an event select it is matched correctly in the pmu event filter and not inadvertently filtered. This could happen on AMD if the high nybble[1] in the event select gets truncated away only leaving the bottom byte[2] left for matching. This is a contrived example used for the convenience of demonstrating this issue, however, this can be applied to event selects 0x28A (OC Mode Switch) and 0x08A (L1 BTB Correction), where 0x08A could end up being denied when the event select was only set up to deny 0x28A. [1] bits 35:32 in the event select register and bits 11:8 in the event select. [2] bits 7:0 in the event select register and bits 7:0 in the event select. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-3-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20selftests: kvm/x86: Add the helper function create_pmu_event_filterAaron Lewis
Add a helper function that creates a pmu event filter given an event list. Currently, a pmu event filter can only be created with the same hard coded event list. Add a way to create one given a different event list. Also, rename make_pmu_event_filter to alloc_pmu_event_filter to clarify it's purpose given the introduction of create_pmu_event_filter. No functional changes intended. Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220517051238.2566934-2-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20kvm: x86/pmu: Fix the compare function used by the pmu event filterAaron Lewis
When returning from the compare function the u64 is truncated to an int. This results in a loss of the high nybble[1] in the event select and its sign if that nybble is in use. Switch from using a result that can end up being truncated to a result that can only be: 1, 0, -1. [1] bits 35:32 in the event select register and bits 11:8 in the event select. Fixes: 7ff775aca48ad ("KVM: x86/pmu: Use binary search to check filtered events") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220517051238.2566934-1-aaronlewis@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-20x86/tdx: Fix RETs in TDX asmPeter Zijlstra
Because build-testing is over-rated, fix a few trivial objtool complaints: vmlinux.o: warning: objtool: __tdx_module_call+0x3e: missing int3 after ret vmlinux.o: warning: objtool: __tdx_hypercall+0x6e: missing int3 after ret Fixes: eb94f1b6a70a ("x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520083839.GR2578@worktop.programming.kicks-ass.net
2022-05-20objtool: Fix objtool regression on x32 systemsMikulas Patocka
Commit c087c6e7b551 ("objtool: Fix type of reloc::addend") failed to appreciate cross building from ILP32 hosts, where 'int' == 'long' and the issue persists. As such, use s64/int64_t/Elf64_Sxword for this field and suffer the pain that is ISO C99 printf formats for it. Fixes: c087c6e7b551 ("objtool: Fix type of reloc::addend") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> [peterz: reword changelog, s/long long/s64/] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/alpine.LRH.2.02.2205161041260.11556@file01.intranet.prod.int.rdu2.redhat.com
2022-05-20objtool: Fix symbol creationPeter Zijlstra
Nathan reported objtool failing with the following messages: warning: objtool: no non-local symbols !? warning: objtool: gelf_update_symshndx: invalid section index The problem is due to commit 4abff6d48dbc ("objtool: Fix code relocs vs weak symbols") failing to consider the case where an object would have no non-local symbols. The problem that commit tries to address is adding a STB_LOCAL symbol to the symbol table in light of the ELF spec's requirement that: In each symbol table, all symbols with STB_LOCAL binding preced the weak and global symbols. As ``Sections'' above describes, a symbol table section's sh_info section header member holds the symbol table index for the first non-local symbol. The approach taken is to find this first non-local symbol, move that to the end and then re-use the freed spot to insert a new local symbol and increment sh_info. Except it never considered the case of object files without global symbols and got a whole bunch of details wrong -- so many in fact that it is a wonder it ever worked :/ Specifically: - It failed to re-hash the symbol on the new index, so a subsequent find_symbol_by_index() would not find it at the new location and a query for the old location would now return a non-deterministic choice between the old and new symbol. - It failed to appreciate that the GElf wrappers are not a valid disk format (it works because GElf is basically Elf64 and we only support x86_64 atm.) - It failed to fully appreciate how horrible the libelf API really is and got the gelf_update_symshndx() call pretty much completely wrong; with the direct consequence that if inserting a second STB_LOCAL symbol would require moving the same STB_GLOBAL symbol again it would completely come unstuck. Write a new elf_update_symbol() function that wraps all the magic required to update or create a new symbol at a given index. Specifically, gelf_update_sym*() require an @ndx argument that is relative to the @data argument; this means you have to manually iterate the section data descriptor list and update @ndx. Fixes: 4abff6d48dbc ("objtool: Fix code relocs vs weak symbols") Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YoPCTEYjoPqE4ZxB@hirez.programming.kicks-ass.net
2022-05-20x86: Remove empty filesBorislav Petkov
Remove empty files which were supposed to get removed with the respective commits removing the functionality in them: $ find arch/x86/ -empty arch/x86/lib/mmx_32.c arch/x86/include/asm/fpu/internal.h arch/x86/include/asm/mmx.h Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520101723.12006-1-bp@alien8.de
2022-05-20x86/entry: Fixup objtool/ibt validationPeter Zijlstra
Commit 47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap") added a bunch of text references without annotating them, resulting in a spree of objtool complaints: vmlinux.o: warning: objtool: vc_switch_off_ist+0x77: relocation to !ENDBR: entry_SYSCALL_64+0x15c vmlinux.o: warning: objtool: vc_switch_off_ist+0x8f: relocation to !ENDBR: entry_SYSCALL_compat+0xa5 vmlinux.o: warning: objtool: vc_switch_off_ist+0x97: relocation to !ENDBR: .entry.text+0x21ea vmlinux.o: warning: objtool: vc_switch_off_ist+0xef: relocation to !ENDBR: .entry.text+0x162 vmlinux.o: warning: objtool: __sev_es_ist_enter+0x60: relocation to !ENDBR: entry_SYSCALL_64+0x15c vmlinux.o: warning: objtool: __sev_es_ist_enter+0x6c: relocation to !ENDBR: .entry.text+0x162 vmlinux.o: warning: objtool: __sev_es_ist_enter+0x8a: relocation to !ENDBR: entry_SYSCALL_compat+0xa5 vmlinux.o: warning: objtool: __sev_es_ist_enter+0xc1: relocation to !ENDBR: .entry.text+0x21ea Since these text references are used to compare against IP, and are not an indirect call target, they don't need ENDBR so annotate them away. Fixes: 47f33de4aafb ("x86/sev: Mark the code returning to user space as syscall gap") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220520082604.GQ2578@worktop.programming.kicks-ass.net
2022-05-20x86/microcode: Add explicit CPU vendor dependencyBorislav Petkov
Add an explicit dependency to the respective CPU vendor so that the respective microcode support for it gets built only when that support is enabled. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/8ead0da9-9545-b10d-e3db-7df1a1f219e4@infradead.org
2022-05-19Merge tag 'v5.18-p2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix a regression in a recent fix to qcom-rng" * tag 'v5.18-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ
2022-05-19scsi: ufs: core: Fix referencing invalid rsp fieldDaejun Park
Fix referencing sense data when it is invalid. When the length of the data segment is 0, there is no valid information in the rsp field, so ufshpb_rsp_upiu() is returned without additional operation. Link: https://lore.kernel.org/r/252651381.41652940482659.JavaMail.epsvc@epcpadp4 Fixes: 4b5f49079c52 ("scsi: ufs: ufshpb: L2P map management for HPB read") Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Daejun Park <daejun7.park@samsung.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-05-19riscv: dts: microchip: fix gpio1 reg property typoConor Paxton
Fix reg address typo in the gpio1 stanza. Signed-off-by: Conor Paxton <conor.paxton@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Fixes: 528a5b1f2556 ("riscv: dts: microchip: add new peripherals to icicle kit device tree") Link: https://lore.kernel.org/r/20220517104058.2004734-1-conor.paxton@microchip.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-19perf/x86/amd/core: Fix reloading events for SVMSandipan Das
Commit 1018faa6cf23 ("perf/x86/kvm: Fix Host-Only/Guest-Only counting with SVM disabled") addresses an issue in which the Host-Only bit in the counter control registers needs to be masked off when SVM is not enabled. The events need to be reloaded whenever SVM is enabled or disabled for a CPU and this requires the PERF_CTL registers to be reprogrammed using {enable,disable}_all(). However, PerfMonV2 variants of these functions do not reprogram the PERF_CTL registers. Hence, the legacy enable_all() function should also be called. Fixes: 9622e67e3980 ("perf/x86/amd/core: Add PerfMonV2 counter control") Reported-by: Like Xu <likexu@tencent.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220518084327.464005-1-sandipan.das@amd.com
2022-05-19topology: Remove unused cpu_cluster_mask()Dietmar Eggemann
default_topology[] uses cpu_clustergroup_mask() for the CLS level (guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86 (arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c). Fixes: 778c558f49a2c ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Barry Song <baohua@kernel.org> Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com
2022-05-19sched: Reverse sched_class layoutPeter Zijlstra
Because GCC-12 is fully stupid about array bounds and it's just really hard to get a solid array definition from a linker script, flip the array order to avoid needing negative offsets :-/ This makes the whole relational pointer magic a little less obvious, but alas. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/YoOLLmLG7HRTXeEm@hirez.programming.kicks-ass.net
2022-05-19bug: Use normal relative pointers in 'struct bug_entry'Josh Poimboeuf
With CONFIG_GENERIC_BUG_RELATIVE_POINTERS, the addr/file relative pointers are calculated weirdly: based on the beginning of the bug_entry struct address, rather than their respective pointer addresses. Make the relative pointers less surprising to both humans and tools by calculating them the normal way. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390 Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/f0e05be797a16f4fc2401eeb88c8450dcbe61df6.1652362951.git.jpoimboe@kernel.org
2022-05-19sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}Uros Bizjak
Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in sched_clock_{local,remote}. x86 cmpxchg returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220518184953.3446778-1-ubizjak@gmail.com
2022-05-19x86/entry: Fix register corruption in compat syscallJosh Poimboeuf
A panic was reported in the init process on AMD: Run /sbin/init as init process init[1]: segfault at f7fd5ca0 ip 00000000f7f5bbc7 sp 00000000ffa06aa0 error 7 in libc.so[f7f51000+4e000] Code: 8a 44 24 10 88 41 ff 8b 44 24 10 83 c4 2c 5b 5e 5f 5d c3 53 83 ec 08 8b 5c 24 10 81 fb 00 f0 ff ff 76 0c e8 ba dc ff ff f7 db <89> 18 83 cb ff 83 c4 08 89 d8 5b c3 e8 81 60 ff ff 05 28 84 07 00 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b CPU: 1 PID: 1 Comm: init Tainted: G W 5.18.0-rc7-next-20220519 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x7d panic+0x10f/0x28d do_exit.cold+0x18/0x48 do_group_exit+0x2e/0xb0 get_signal+0xb6d/0xb80 arch_do_signal_or_restart+0x31/0x760 ? show_opcodes.cold+0x1c/0x21 ? force_sig_fault+0x49/0x70 exit_to_user_mode_prepare+0x131/0x1a0 irqentry_exit_to_user_mode+0x5/0x30 asm_exc_page_fault+0x27/0x30 RIP: 0023:0xf7f5bbc7 Code: 8a 44 24 10 88 41 ff 8b 44 24 10 83 c4 2c 5b 5e 5f 5d c3 53 83 ec 08 8b 5c 24 10 81 fb 00 f0 ff ff 76 0c e8 ba dc ff ff f7 db <89> 18 83 cb ff 83 c4 08 89 d8 5b c3 e8 81 60 ff ff 05 28 84 07 00 RSP: 002b:00000000ffa06aa0 EFLAGS: 00000217 RAX: 00000000f7fd5ca0 RBX: 000000000000000c RCX: 0000000000001000 RDX: 0000000000000001 RSI: 00000000f7fd5b60 RDI: 00000000f7fd5b60 RBP: 00000000f7fd1c1c R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The task's CX register got corrupted by commit 8c42819b61b8 ("x86/entry: Use PUSH_AND_CLEAR_REGS for compat"), which overlooked the fact that compat SYSCALL apparently stores the user's CX value in BP. Before that commit, CX was saved from its stashed value in BP: pushq %rbp /* pt_regs->cx (stashed in bp) */ But then it got changed to: pushq %rcx /* pt_regs->cx */ So the wrong value got saved and later restored back to the user. Fix it by pushing the correct value again (BP) for regs->cx. Fixes: 8c42819b61b8 ("x86/entry: Use PUSH_AND_CLEAR_REGS for compat") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/b5a26592c9dd60bbacdf97974a7433fd802a5593.1652985970.git.jpoimboe@kernel.org
2022-05-19nvme: set non-mdts limits in nvme_scan_workChaitanya Kulkarni
In current implementation we set the non-mdts limits by calling nvme_init_non_mdts_limits() from nvme_init_ctrl_finish(). This also tries to set the limits for the discovery controller which has no I/O queues resulting in the warning message reported by the nvme_log_error() when running blktest nvme/002: - [ 2005.155946] run blktests nvme/002 at 2022-04-09 16:57:47 [ 2005.192223] loop: module loaded [ 2005.196429] nvmet: adding nsid 1 to subsystem blktests-subsystem-0 [ 2005.200334] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 <------------------------------SNIP----------------------------------> [ 2008.958108] nvmet: adding nsid 1 to subsystem blktests-subsystem-997 [ 2008.962082] nvmet: adding nsid 1 to subsystem blktests-subsystem-998 [ 2008.966102] nvmet: adding nsid 1 to subsystem blktests-subsystem-999 [ 2008.973132] nvmet: creating discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN testhostnqn. *[ 2008.973196] nvme1: Identify(0x6), Invalid Field in Command (sct 0x0 / sc 0x2) MORE DNR* [ 2008.974595] nvme nvme1: new ctrl: "nqn.2014-08.org.nvmexpress.discovery" [ 2009.103248] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery" Move the call of nvme_init_non_mdts_limits() to nvme_scan_work() after we verify that I/O queues are created since that is a converging point for each transport where these limits are actually used. 1. FC : nvme_fc_create_association() ... nvme_fc_create_io_queues(ctrl); ... nvme_start_ctrl() nvme_scan_queue() nvme_scan_work() 2. PCIe:- nvme_reset_work() ... nvme_setup_io_queues() nvme_create_io_queues() nvme_alloc_queue() ... nvme_start_ctrl() nvme_scan_queue() nvme_scan_work() 3. RDMA :- nvme_rdma_setup_ctrl ... nvme_rdma_configure_io_queues ... nvme_start_ctrl() nvme_scan_queue() nvme_scan_work() 4. TCP :- nvme_tcp_setup_ctrl ... nvme_tcp_configure_io_queues ... nvme_start_ctrl() nvme_scan_queue() nvme_scan_work() * nvme_scan_work() ... nvme_validate_or_alloc_ns() nvme_alloc_ns() nvme_update_ns_info() nvme_update_disk_info() nvme_config_discard() <--- blk_queue_max_write_zeroes_sectors() <--- Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-05-19riscv: dts: sifive: fu540-c000: align dma node name with dtschemaKrzysztof Kozlowski
Fixes dtbs_check warnings like: dma@3000000: $nodename:0: 'dma@3000000' does not match '^dma-controller(@.*)?$' Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Link: https://lore.kernel.org/r/20220407193856.18223-1-krzysztof.kozlowski@linaro.org Fixes: c5ab54e9945b ("riscv: dts: add support for PDMA device of HiFive Unleashed Rev A00") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-05-19io_uring: fix incorrect __kernel_rwf_t castVasily Averin
Currently 'make C=1 fs/io_uring.o' generates sparse warning: CHECK fs/io_uring.c fs/io_uring.c: note: in included file (through include/trace/trace_events.h, include/trace/define_trace.h, i nclude/trace/events/io_uring.h): ./include/trace/events/io_uring.h:488:1: warning: incorrect type in assignment (different base types) expected unsigned int [usertype] op_flags got restricted __kernel_rwf_t const [usertype] rw_flags This happen on cast of sqe->rw_flags which is defined as __kernel_rwf_t, this type is bitwise and requires __force attribute for any casts. However rw_flags is a member of the union, and its access can be safely replaced by using of its neighbours, so let's use poll32_events to fix the sparse warning. Signed-off-by: Vasily Averin <vvs@openvz.org> Link: https://lore.kernel.org/r/6f009241-a63f-ae43-a04b-62841aaef293@openvz.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-19PM: domains: Trust domain-idle-states from DT to be correct by genpdUlf Hansson
If genpd has parsed the domain-idle-states from DT, it's reasonable to believe that the parsed data should be correct for the HW in question. Based upon this, it seem superfluous to let genpd measure the corresponding power-on/off latencies for these states. Therefore, let's improve the behaviour in genpd by avoiding the measurements for the domain-idle-states that have been parsed from DT. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19PM: domains: Measure power-on/off latencies in genpd based on a governorUlf Hansson
The measurements of the power-on|off latencies in genpd for a PM domain are superfluous, unless the corresponding genpd has a governor assigned to it, which would make use of the data. Therefore, let's improve the behaviour in genpd by making the measurements conditional, based upon if there's a governor assigned. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19PM: domains: Allocate governor data dynamically based on a genpd governorUlf Hansson
If a genpd doesn't have an associated governor assigned, several variables in the struct generic_pm_domain becomes superfluous. Rather than wasting memory in allocated genpds, let's move the variables from the struct generic_pm_domain into a new separate struct. In this way, we can instead dynamically decide when we need to allocate the corresponding data for it. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>