summaryrefslogtreecommitdiff
path: root/arch/riscv
AgeCommit message (Collapse)Author
2021-06-19riscv: dts: fu740: fix cache-controller interruptsDavid Abdurachmanov
The order of interrupt numbers is incorrect. The order for FU740 is: DirError, DataError, DataFail, DirFail From SiFive FU740-C000 Manual: 19 - L2 Cache DirError 20 - L2 Cache DirFail 21 - L2 Cache DataError 22 - L2 Cache DataFail Signed-off-by: David Abdurachmanov <david.abdurachmanov@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-18riscv: Ensure BPF_JIT_REGION_START aligned with PMD sizeJisheng Zhang
Andreas reported commit fc8504765ec5 ("riscv: bpf: Avoid breaking W^X") breaks booting with one kind of defconfig, I reproduced a kernel panic with the defconfig: [ 0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220 [ 0.139159] Oops [#1] [ 0.139303] Modules linked in: [ 0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1 [ 0.139934] Hardware name: riscv-virtio,qemu (DT) [ 0.140193] epc : __memset+0xc4/0xfc [ 0.140416] ra : skb_flow_dissector_init+0x1e/0x82 [ 0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0 [ 0.140878] gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158 [ 0.141156] t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0 [ 0.141424] s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000 [ 0.141654] a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064 [ 0.141893] a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff [ 0.142126] s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088 [ 0.142353] s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438 [ 0.142584] s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac [ 0.142810] s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000 [ 0.143042] t5 : 0000000000000155 t6 : 00000000000003ff [ 0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f [ 0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc [ 0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60 [ 0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168 [ 0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224 [ 0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110 [ 0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc [ 0.145124] ---[ end trace f1e9643daa46d591 ]--- After some investigation, I think I found the root cause: commit 2bfc6cd81bd ("move kernel mapping outside of linear mapping") moves BPF JIT region after the kernel: | #define BPF_JIT_REGION_START PFN_ALIGN((unsigned long)&_end) The &_end is unlikely aligned with PMD size, so the front bpf jit region sits with part of kernel .data section in one PMD size mapping. But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is called to make the first bpf jit prog ROX, we will make part of kernel .data section RO too, so when we write to, for example memset the .data section, MMU will trigger a store page fault. To fix the issue, we need to ensure the BPF JIT region is PMD size aligned. This patch acchieve this goal by restoring the BPF JIT region to original position, I.E the 128MB before kernel .text section. The modification to kasan_init.c is inspired by Alexandre. Fixes: fc8504765ec5 ("riscv: bpf: Avoid breaking W^X") Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-18riscv: kasan: Fix MODULES_VADDR evaluation due to local variables' nameJisheng Zhang
commit 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") makes use of MODULES_VADDR to populate kernel, BPF, modules mapping. Currently, MODULES_VADDR is defined as below for RV64: | #define MODULES_VADDR (PFN_ALIGN((unsigned long)&_end) - SZ_2G) But kasan_init() has two local variables which are also named as _start, _end, so MODULES_VADDR is evaluated with the local variable _end rather than the global "_end" as we expected. Fix this issue by renaming the two local variables. Fixes: 2bfc6cd81bd1 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-12riscv: sifive: fix Kconfig errata warningRandy Dunlap
The SOC_SIFIVE Kconfig entry unconditionally selects ERRATA_SIFIVE. However, ERRATA_SIFIVE depends on RISCV_ERRATA_ALTERNATIVE, which is not set, so SOC_SIFIVE should either depend on or select RISCV_ERRATA_ALTERNATIVE. Use 'select' here to quieten the Kconfig warning. WARNING: unmet direct dependencies detected for ERRATA_SIFIVE Depends on [n]: RISCV_ERRATA_ALTERNATIVE [=n] Selected by [y]: - SOC_SIFIVE [=y] Fixes: 1a0e5dbd3723 ("riscv: sifive: Add SiFive alternative ports") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: linux-riscv@lists.infradead.org Cc: Vincent Chen <vincent.chen@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-12riscv32: Use medany C model for modulesKhem Raj
When CONFIG_CMODEL_MEDLOW is used it ends up generating riscv_hi20_rela relocations in modules which are not resolved during runtime and following errors would be seen [ 4.802714] virtio_input: target 00000000c1539090 can not be addressed by the 32-bit offset from PC = 39148b7b [ 4.854800] virtio_input: target 00000000c1539090 can not be addressed by the 32-bit offset from PC = 9774456d Signed-off-by: Khem Raj <raj.khem@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-11riscv: Fix BUILTIN_DTB for sifive and microchip socAlexandre Ghiti
Fix BUILTIN_DTB config which resulted in a dtb that was actually not built into the Linux image: in the same manner as Canaan soc does, create an object file from the dtb file that will get linked into the Linux image. Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-10riscv: alternative: fix typo in macro nameVitaly Wool
alternative-macros.h defines ALT_NEW_CONTENT in its assembly part and ALT_NEW_CONSTENT in the C part. Most likely it is the latter that is wrong. Fixes: 6f4eea90465ad (riscv: Introduce alternative mechanism to apply errata solution) Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-10riscv: code patching only works on !XIP_KERNELJisheng Zhang
Some features which need code patching such as KPROBES, DYNAMIC_FTRACE KGDB can only work on !XIP_KERNEL. Add dependencies for these features that rely on code patching. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-10riscv: xip: support runtime trap patchingVitaly Wool
RISCV_ERRATA_ALTERNATIVE patches text at runtime which is currently not possible when the kernel is executed from the flash in XIP mode. Since runtime patching concerns only traps at the moment, let's just have all the traps reside in RAM anyway if RISCV_ERRATA_ALTERNATIVE is set. Thus, these functions will be patch-able even when the .text section is in flash. Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-01Merge remote-tracking branch 'riscv/riscv-wx-mappings' into fixesPalmer Dabbelt
This single commit is shared between fixes and for-next, as it fixes a concrete bug while likely conflicting with a more invasive cleanup to avoid these oddball mappings entirely. * riscv/riscv-wx-mappings: riscv: mm: Fix W+X mappings at boot
2021-06-01RISC-V: Fix memblock_free() usages in init_resources()Wende Tan
`memblock_free()` takes a physical address as its first argument. Fix the wrong usages in `init_resources()`. Fixes: ffe0e526126884cf036a6f724220f1f9b4094fd2 ("RISC-V: Improve init_resources()") Fixes: 797f0375dd2ef5cdc68ac23450cbae9a5c67a74e ("RISC-V: Do not allocate memblock while iterating reserved memblocks") Signed-off-by: Wende Tan <twd2.me@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-01riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabledVincent
The errata_cip_453.o should be built only when the Kconfig CONFIG_ERRATA_SIFIVE_CIP_453 is enabled. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent <vincent.chen@sifive.com> Fixes: 0e0d4992517f ("riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=y") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-06-01riscv: mm: Fix W+X mappings at bootJisheng Zhang
When the kernel mapping was moved the last 2GB of the address space, (__va(PFN_PHYS(max_low_pfn))) is much smaller than the .data section start address, the last set_memory_nx() in protect_kernel_text_data() will fail, thus the .data section is still mapped as W+X. This results in below W+X mapping waring at boot. Fix it by passing the correct .data section page num to the set_memory_nx(). [ 0.396516] ------------[ cut here ]------------ [ 0.396889] riscv/mm: Found insecure W+X mapping at address (____ptrval____)/0xffffffff80c00000 [ 0.398347] WARNING: CPU: 0 PID: 1 at arch/riscv/mm/ptdump.c:258 note_page+0x244/0x24a [ 0.398964] Modules linked in: [ 0.399459] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc1+ #14 [ 0.400003] Hardware name: riscv-virtio,qemu (DT) [ 0.400591] epc : note_page+0x244/0x24a [ 0.401368] ra : note_page+0x244/0x24a [ 0.401772] epc : ffffffff80007c86 ra : ffffffff80007c86 sp : ffffffe000e7bc30 [ 0.402304] gp : ffffffff80caae88 tp : ffffffe000e70000 t0 : ffffffff80cb80cf [ 0.402800] t1 : ffffffff80cb80c0 t2 : 0000000000000000 s0 : ffffffe000e7bc80 [ 0.403310] s1 : ffffffe000e7bde8 a0 : 0000000000000053 a1 : ffffffff80c83ff0 [ 0.403805] a2 : 0000000000000010 a3 : 0000000000000000 a4 : 6c7e7a5137233100 [ 0.404298] a5 : 6c7e7a5137233100 a6 : 0000000000000030 a7 : ffffffffffffffff [ 0.404849] s2 : ffffffff80e00000 s3 : 0000000040000000 s4 : 0000000000000000 [ 0.405393] s5 : 0000000000000000 s6 : 0000000000000003 s7 : ffffffe000e7bd48 [ 0.405935] s8 : ffffffff81000000 s9 : ffffffffc0000000 s10: ffffffe000e7bd48 [ 0.406476] s11: 0000000000001000 t3 : 0000000000000072 t4 : ffffffffffffffff [ 0.407016] t5 : 0000000000000002 t6 : ffffffe000e7b978 [ 0.407435] status: 0000000000000120 badaddr: 0000000000000000 cause: 0000000000000003 [ 0.408052] Call Trace: [ 0.408343] [<ffffffff80007c86>] note_page+0x244/0x24a [ 0.408855] [<ffffffff8010c5a6>] ptdump_hole+0x14/0x1e [ 0.409263] [<ffffffff800f65c6>] walk_pgd_range+0x2a0/0x376 [ 0.409690] [<ffffffff800f6828>] walk_page_range_novma+0x4e/0x6e [ 0.410146] [<ffffffff8010c5f8>] ptdump_walk_pgd+0x48/0x78 [ 0.410570] [<ffffffff80007d66>] ptdump_check_wx+0xb4/0xf8 [ 0.410990] [<ffffffff80006738>] mark_rodata_ro+0x26/0x2e [ 0.411407] [<ffffffff8031961e>] kernel_init+0x44/0x108 [ 0.411814] [<ffffffff80002312>] ret_from_exception+0x0/0xc [ 0.412309] ---[ end trace 7ec3459f2547ea83 ]--- [ 0.413141] Checked W+X mappings: failed, 512 W+X pages found Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-29riscv: Use -mno-relax when using lld linkerKhem Raj
lld does not implement the RISCV relaxation optimizations like GNU ld therefore disable it when building with lld, Also pass it to assembler when using external GNU assembler ( LLVM_IAS != 1 ), this ensures that relevant assembler option is also enabled along. if these options are not used then we see following relocations in objects 0000000000000000 R_RISCV_ALIGN *ABS*+0x0000000000000002 These are then rejected by lld ld.lld: error: capability.c:(.fixup+0x0): relocation R_RISCV_ALIGN requires unimplemented linker relaxation; recompile with -mno-relax but the .o is already compiled with -mno-relax Signed-off-by: Khem Raj <raj.khem@gmail.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: kexec: Fix W=1 build warningsJisheng Zhang
Fixes the following W=1 build warning(s): In file included from include/linux/kexec.h:28, from arch/riscv/kernel/machine_kexec.c:7: arch/riscv/include/asm/kexec.h:45:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration] 45 | const extern unsigned char riscv_kexec_relocate[]; | ^~~~~ arch/riscv/include/asm/kexec.h:46:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration] 46 | const extern unsigned int riscv_kexec_relocate_size; | ^~~~~ arch/riscv/kernel/machine_kexec.c:125:6: warning: no previous prototype for ‘machine_shutdown’ [-Wmissing-prototypes] 125 | void machine_shutdown(void) | ^~~~~~~~~~~~~~~~ arch/riscv/kernel/machine_kexec.c:147:1: warning: no previous prototype for ‘machine_crash_shutdown’ [-Wmissing-prototypes] 147 | machine_crash_shutdown(struct pt_regs *regs) | ^~~~~~~~~~~~~~~~~~~~~~ arch/riscv/kernel/machine_kexec.c:23: warning: Function parameter or member 'image' not described in 'kexec_image_info' arch/riscv/kernel/machine_kexec.c:53: warning: Function parameter or member 'image' not described in 'machine_kexec_prepare' arch/riscv/kernel/machine_kexec.c:114: warning: Function parameter or member 'image' not described in 'machine_kexec_cleanup' arch/riscv/kernel/machine_kexec.c:148: warning: Function parameter or member 'regs' not described in 'machine_crash_shutdown' arch/riscv/kernel/machine_kexec.c:167: warning: Function parameter or member 'image' not described in 'machine_kexec' Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: kprobes: Fix build error when MMU=nJisheng Zhang
lkp reported a randconfig failure: arch/riscv/kernel/probes/kprobes.c:90:22: error: use of undeclared identifier 'PAGE_KERNEL_READ_EXEC' We implemented the alloc_insn_page() to allocate PAGE_KERNEL_READ_EXEC page for kprobes insn page for STRICT_MODULE_RWX. But if MMU=n, we should fall back to the generic weak alloc_insn_page() by generic kprobe subsystem. Fixes: cdd1b2bd358f ("riscv: kprobes: Implement alloc_insn_page()") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: Select ARCH_USE_MEMTESTKefeng Wang
As of commit dce44566192e ("mm/memtest: add ARCH_USE_MEMTEST"), architectures must select ARCH_USE_MEMTESET to enable CONFIG_MEMTEST. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Fixes: f6e5aedf470b ("riscv: Add support for memtest") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-22riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabledChen Huang
As [1] and [2] said, the arch_stack_walk should not to trace itself, or it will leave the trace unexpectedly when called. The example is when we do "cat /sys/kernel/debug/page_owner", all pages' stack is the same. arch_stack_walk+0x18/0x20 stack_trace_save+0x40/0x60 register_dummy_stack+0x24/0x5e init_page_owner+0x2e So we use __builtin_frame_address(1) as the first frame to be walked. And mark the arch_stack_walk() noinline. We found that pr_cont will affact pages' stack whose task state is RUNNING when testing "echo t > /proc/sysrq-trigger". So move the place of pr_cont and mark the function dump_backtrace() noinline. Also we move the case when task == NULL into else branch, and test for it in "echo c > /proc/sysrq-trigger". [1] https://lore.kernel.org/lkml/20210319184106.5688-1-mark.rutland@arm.com/ [2] https://lore.kernel.org/lkml/20210317142050.57712-1-chenjun102@huawei.com/ Signed-off-by: Chen Huang <chenhuang5@huawei.com> Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: remove unused handle_exception symbolRouven Czerwinski
Since commit 79b1feba5455 ("RISC-V: Setup exception vector early") exception vectors are setup early and the handle_exception symbol from the asm files is no longer referenced in traps.c. Remove it. Signed-off-by: Rouven Czerwinski <rouven@czerwinskis.de> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: Consistify protect_kernel_linear_mapping_text_rodata() useGeert Uytterhoeven
The various uses of protect_kernel_linear_mapping_text_rodata() are not consistent: - Its definition depends on "64BIT && !XIP_KERNEL", - Its forward declaration depends on MMU, - Its single caller depends on "STRICT_KERNEL_RWX && 64BIT && MMU && !XIP_KERNEL". Fix this by settling on the dependencies of the caller, which can be simplified as STRICT_KERNEL_RWX depends on "MMU && !XIP_KERNEL". Provide a dummy definition, as the caller is protected by "IS_ENABLED(CONFIG_STRICT_KERNEL_RWX)" instead of "#ifdef CONFIG_STRICT_KERNEL_RWX". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: enable SiFive errata CIP-453 and CIP-1200 Kconfig only if CONFIG_64BIT=yVincent Chen
The corresponding hardware issues of CONFIG_ERRATA_SIFIVE_CIP_453 and CONFIG_ERRATA_SIFIVE_CIP_1200 only exist in the SiFive 64bit CPU cores. Therefore, these two errata are required only if CONFIG_64BIT=y Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Fixes: bff3ff525460 ("riscv: sifive: Apply errata "cip-1200" patch") Fixes: 800149a77c2c ("riscv: sifive: Apply errata "cip-453" patch") Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06riscv: Only extend kernel reservation if mapped read-onlyGeert Uytterhoeven
When the kernel mapping was moved outside of the linear mapping, the kernel memory reservation was increased, to take into account mapping granularity. However, this is done unconditionally, regardless of whether the kernel memory is mapped read-only or not. If this extension is not needed, up to 2 MiB may be lost, which has a big impact on e.g. Canaan K210 (64-bit nommu) platforms with only 8 MiB of RAM. Reclaim the lost memory by only extending the reserved region when needed, i.e. depending on a simplified version of the conditional logic around the call to protect_kernel_linear_mapping_text_rodata(). Fixes: 2bfc6cd81bd17e43 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-06Merge tag 'riscv-for-linus-5.13-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the memtest= kernel command-line argument. - Support for building the kernel with FORTIFY_SOURCE. - Support for generic clockevent broadcasts. - Support for the buildtar build target. - Some build system cleanups to pass more LLVM-friendly arguments. - Support for kprobes. - A rearranged kernel memory map, the first part of supporting sv48 systems. - Improvements to kexec, along with support for kdump and crash kernels. - An alternatives-based errata framework, along with support for handling a pair of errata that manifest on some SiFive designs (including the HiFive Unmatched). - Support for XIP. - A device tree for the Microchip PolarFire ICICLE SoC and associated dev board. ... along with a bunch of cleanups. There are already a handful of fixes on the list so there will likely be a part 2. * tag 'riscv-for-linus-5.13-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (45 commits) RISC-V: Always define XIP_FIXUP riscv: Remove 32b kernel mapping from page table dump riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=y RISC-V: Fix error code returned by riscv_hartid_to_cpuid() RISC-V: Enable Microchip PolarFire ICICLE SoC RISC-V: Initial DTS for Microchip ICICLE board dt-bindings: riscv: microchip: Add YAML documentation for the PolarFire SoC RISC-V: Add Microchip PolarFire SoC kconfig option RISC-V: enable XIP RISC-V: Add crash kernel support RISC-V: Add kdump support RISC-V: Improve init_resources() RISC-V: Add kexec support RISC-V: Add EM_RISCV to kexec UAPI header riscv: vdso: fix and clean-up Makefile riscv/mm: Use BUG_ON instead of if condition followed by BUG. riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobe riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMU riscv: module: Create module allocations without exec permissions riscv: bpf: Avoid breaking W^X ...
2021-05-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...
2021-05-05Merge tag 'pci-v5.13-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Release OF node when pci_scan_device() fails (Dmitry Baryshkov) - Add pci_disable_parity() (Bjorn Helgaas) - Disable Mellanox Tavor parity reporting (Heiner Kallweit) - Disable N2100 r8169 parity reporting (Heiner Kallweit) - Fix RCiEP device to RCEC association (Qiuxu Zhuo) - Convert sysfs "config", "rom", "reset", "label", "index", "acpi_index" to static attributes to help fix races in device enumeration (Krzysztof Wilczyński) - Convert sysfs "vpd" to static attribute (Heiner Kallweit, Krzysztof Wilczyński) - Use sysfs_emit() in "show" functions (Krzysztof Wilczyński) - Remove unused alloc_pci_root_info() return value (Krzysztof Wilczyński) PCI device hotplug: - Fix acpiphp reference count leak (Feilong Lin) Power management: - Fix acpi_pci_set_power_state() debug message (Rafael J. Wysocki) - Fix runtime PM imbalance (Dinghao Liu) Virtualization: - Increase delay after FLR to work around Intel DC P4510 NVMe erratum (Raphael Norwitz) MSI: - Convert rcar, tegra, xilinx to MSI domains (Marc Zyngier) - For rcar, xilinx, use controller address as MSI doorbell (Marc Zyngier) - Remove unused hv msi_controller struct (Marc Zyngier) - Remove unused PCI core msi_controller support (Marc Zyngier) - Remove struct msi_controller altogether (Marc Zyngier) - Remove unused default_teardown_msi_irqs() (Marc Zyngier) - Let host bridges declare their reliance on MSI domains (Marc Zyngier) - Make pci_host_common_probe() declare its reliance on MSI domains (Marc Zyngier) - Advertise mediatek lack of built-in MSI handling (Thomas Gleixner) - Document ways of ending up with NO_MSI (Marc Zyngier) - Refactor HT advertising of NO_MSI flag (Marc Zyngier) VPD: - Remove obsolete Broadcom NIC VPD length-limiting quirk (Heiner Kallweit) - Remove sysfs VPD size checking dead code (Heiner Kallweit) - Convert VPF sysfs file to static attribute (Heiner Kallweit) - Remove unnecessary pci_set_vpd_size() (Heiner Kallweit) - Tone down "missing VPD" message (Heiner Kallweit) Endpoint framework: - Fix NULL pointer dereference when epc_features not implemented (Shradha Todi) - Add missing destroy_workqueue() in endpoint test (Yang Yingliang) Amazon Annapurna Labs PCIe controller driver: - Fix compile testing without CONFIG_PCI_ECAM (Arnd Bergmann) - Fix "no symbols" warnings when compile testing with CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann) APM X-Gene PCIe controller driver: - Fix cfg resource mapping regression (Dejin Zheng) Broadcom iProc PCIe controller driver: - Return zero for success of iproc_msi_irq_domain_alloc() (Pali Rohár) Broadcom STB PCIe controller driver: - Add reset_control_rearm() stub for !CONFIG_RESET_CONTROLLER (Jim Quinlan) - Fix use of BCM7216 reset controller (Jim Quinlan) - Use reset/rearm for Broadcom STB pulse reset instead of deassert/assert (Jim Quinlan) - Fix brcm_pcie_probe() error return for unsupported revision (Wei Yongjun) Cavium ThunderX PCIe controller driver: - Fix compile testing (Arnd Bergmann) - Fix "no symbols" warnings when compile testing with CONFIG_TRIM_UNUSED_KSYMS (Arnd Bergmann) Freescale Layerscape PCIe controller driver: - Fix ls_pcie_ep_probe() syntax error (comma for semicolon) (Krzysztof Wilczyński) - Remove layerscape-gen4 dependencies on OF and ARM64, add dependency on ARCH_LAYERSCAPE (Geert Uytterhoeven) HiSilicon HIP PCIe controller driver: - Remove obsolete HiSilicon PCIe DT description (Dongdong Liu) Intel Gateway PCIe controller driver: - Remove unused pcie_app_rd() (Jiapeng Chong) Intel VMD host bridge driver: - Program IRTE with Requester ID of VMD endpoint, not child device (Jon Derrick) - Disable VMD MSI-X remapping when possible so children can use more MSI-X vectors (Jon Derrick) MediaTek PCIe controller driver: - Configure FC and FTS for functions other than 0 (Ryder Lee) - Add YAML schema for MediaTek (Jianjun Wang) - Export pci_pio_to_address() for module use (Jianjun Wang) - Add MediaTek MT8192 PCIe controller driver (Jianjun Wang) - Add MediaTek MT8192 INTx support (Jianjun Wang) - Add MediaTek MT8192 MSI support (Jianjun Wang) - Add MediaTek MT8192 system power management support (Jianjun Wang) - Add missing MODULE_DEVICE_TABLE (Qiheng Lin) Microchip PolarFlare PCIe controller driver: - Make several symbols static (Wei Yongjun) NVIDIA Tegra PCIe controller driver: - Add MCFG quirks for Tegra194 ECAM errata (Vidya Sagar) - Make several symbols const (Rikard Falkeborn) - Fix Kconfig host/endpoint typo (Wesley Sheng) SiFive FU740 PCIe controller driver: - Add pcie_aux clock to prci driver (Greentime Hu) - Use reset-simple in prci driver for PCIe (Greentime Hu) - Add SiFive FU740 PCIe host controller driver and DT binding (Paul Walmsley, Greentime Hu) Synopsys DesignWare PCIe controller driver: - Move MSI Receiver init to dw_pcie_host_init() so it is re-initialized along with the RC in resume (Jisheng Zhang) - Move iATU detection earlier to fix regression (Hou Zhiqiang) TI J721E PCIe driver: - Add DT binding and TI j721e support for refclk to PCIe connector (Kishon Vijay Abraham I) - Add host mode and endpoint mode DT bindings for TI AM64 SoC (Kishon Vijay Abraham I) TI Keystone PCIe controller driver: - Use generic config accessors for TI AM65x (K3) to fix regression (Kishon Vijay Abraham I) Xilinx NWL PCIe controller driver: - Add support for coherent PCIe DMA traffic using CCI (Bharat Kumar Gogada) - Add optional "dma-coherent" DT property (Bharat Kumar Gogada) Miscellaneous: - Fix kernel-doc warnings (Krzysztof Wilczyński) - Remove unused MicroGate SyncLink device IDs (Jiri Slaby) - Remove redundant dev_err() for devm_ioremap_resource() failure (Chen Hui) - Remove redundant initialization (Colin Ian King) - Drop redundant dev_err() for platform_get_irq() errors (Krzysztof Wilczyński)" * tag 'pci-v5.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (98 commits) riscv: dts: Add PCIe support for the SiFive FU740-C000 SoC PCI: fu740: Add SiFive FU740 PCIe host controller driver dt-bindings: PCI: Add SiFive FU740 PCIe host controller MAINTAINERS: Add maintainers for SiFive FU740 PCIe driver clk: sifive: Use reset-simple in prci driver for PCIe driver clk: sifive: Add pcie_aux clock in prci driver for PCIe driver PCI: brcmstb: Use reset/rearm instead of deassert/assert ata: ahci_brcm: Fix use of BCM7216 reset controller reset: add missing empty function reset_control_rearm() PCI: Allow VPD access for QLogic ISP2722 PCI/VPD: Add helper pci_get_func0_dev() PCI/VPD: Remove pci_vpd_find_tag() SRDT handling PCI/VPD: Remove pci_vpd_find_tag() 'offset' argument PCI/VPD: Change pci_vpd_init() return type to void PCI/VPD: Make missing VPD message less alarming PCI/VPD: Remove pci_set_vpd_size() x86/PCI: Remove unused alloc_pci_root_info() return value MAINTAINERS: Add Jianjun Wang as MediaTek PCI co-maintainer PCI: mediatek-gen3: Add system PM support PCI: mediatek-gen3: Add MSI support ...
2021-05-05mm: generalize SYS_SUPPORTS_HUGETLBFS (rename as ARCH_SUPPORTS_HUGETLBFS)Anshuman Khandual
SYS_SUPPORTS_HUGETLBFS config has duplicate definitions on platforms that subscribe it. Instead, just make it a generic option which can be selected on applicable platforms. Also rename it as ARCH_SUPPORTS_HUGETLBFS instead. This reduces code duplication and makes it cleaner. Link: https://lkml.kernel.org/r/1617259448-22529-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Palmer Dabbelt <palmerdabbelt@google.com> [riscv] Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-04Merge tag 'm68knommu-for-v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: - a fix for interrupt number range checking for the ColdFire SIMR interrupt controller. - changes for the binfmt_flat binary loader to allow RISC-V nommu support it needs to be able to accept flat binaries that have no gap between the text and data sections. * tag 'm68knommu-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: coldfire: fix irq ranges riscv: Disable data start offset in flat binaries binfmt_flat: allow not offsetting data start
2021-05-04riscv: dts: Add PCIe support for the SiFive FU740-C000 SoCGreentime Hu
Link: https://lore.kernel.org/r/20210504105940.100004-7-greentime.hu@sifive.com Signed-off-by: Greentime Hu <greentime.hu@sifive.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-02Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: useful constants: struct qstr for ".." hostfs_open(): don't open-code file_dentry() whack-a-mole: kill strlen_user() (again) autofs: should_expire() argument is guaranteed to be positive apparmor:match_mn() - constify devpath argument buffer: a small optimization in grow_buffers get rid of autofs_getpath() constify dentry argument of dentry_path()/dentry_path_raw()
2021-05-01RISC-V: Always define XIP_FIXUPPalmer Dabbelt
XIP depends on MMU, but XIP_FIXUP is used throughout the kernel in order to avoid excessive ifdefs. This just makes sure to always define XIP_FIXUP, which will fix MMU=n builds. XIP_OFFSET is used by assembly but XIP_FIXUP is C-only, so they're split. Fixes: 44c922572952 ("RISC-V: enable XIP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Tested-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-01riscv: Remove 32b kernel mapping from page table dumpAlexandre Ghiti
The 32b kernel mapping lies in the linear mapping, there is no point in printing its address in page table dump, so remove this leftover that comes from moving the kernel mapping outside the linear mapping for 64b kernel. Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-01riscv: Fix 32b kernel build with CONFIG_DEBUG_VIRTUAL=yAlexandre Ghiti
Declare kernel_virt_addr for 32b kernel since it is used in __phys_addr_symbol defined when CONFIG_DEBUG_VIRTUAL is set. Fixes: 2bfc6cd81bd17 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-05-01RISC-V: Fix error code returned by riscv_hartid_to_cpuid()Anup Patel
We should return a negative error code upon failure in riscv_hartid_to_cpuid() instead of NR_CPUS. This is also aligned with all uses of riscv_hartid_to_cpuid() which expect negative error code upon failure. Fixes: 6825c7a80f18 ("RISC-V: Add logical CPU indexing for RISC-V") Fixes: f99fb607fb2b ("RISC-V: Use Linux logical CPU number instead of hartid") Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-30mm: move mem_init_print_info() into mm_init()Kefeng Wang
mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: https://lkml.kernel.org/r/20210317015210.33641-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> [x86] Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> [powerpc] Acked-by: David Hildenbrand <david@redhat.com> Tested-by: Anatoly Pugachev <matorola@gmail.com> [sparc64] Acked-by: Russell King <rmk+kernel@armlinux.org.uk> [arm] Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Guo Ren <guoren@kernel.org> Cc: Yoshinori Sato <ysato@users.osdn.me> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonas Bonn <jonas@southpole.se> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: "Peter Zijlstra" <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-26RISC-V: Enable Microchip PolarFire ICICLE SoCAtish Patra
Enable Microchip PolarFire ICICLE soc config in defconfig. It allows the default upstream kernel to boot on PolarFire ICICLE board. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Bin Meng <bin.meng@windriver.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Initial DTS for Microchip ICICLE boardAtish Patra
Add initial DTS for Microchip ICICLE board having only essential devices (clocks, sdhci, ethernet, serial, etc). The device tree is based on the U-Boot patch. https://patchwork.ozlabs.org/project/uboot/patch/20201110103414.10142-6-padmarao.begari@microchip.com/ Signed-off-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Add Microchip PolarFire SoC kconfig optionAtish Patra
Add Microchip PolarFire kconfig option which selects SoC specific and common drivers that is required for this SoC. Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Bin Meng <bin.meng@windriver.com> Reviewed-by: Anup Patel <anup@brainfault.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: enable XIPVitaly Wool
Introduce XIP (eXecute In Place) support for RISC-V platforms. It allows code to be executed directly from non-volatile storage directly addressable by the CPU, such as QSPI NOR flash which can be found on many RISC-V platforms. This makes way for significant optimization of RAM footprint. The XIP kernel is not compressed since it has to run directly from flash, so it will occupy more space on the non-volatile storage. The physical flash address used to link the kernel object files and for storing it has to be known at compile time and is represented by a Kconfig option. XIP on RISC-V will for the time being only work on MMU-enabled kernels. Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.com> [Alex: Rebase on top of "Move kernel mapping outside the linear mapping" ] Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> [Palmer: disable XIP for allyesconfig] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Add crash kernel supportNick Kossifidis
This patch allows Linux to act as a crash kernel for use with kdump. Userspace will let the crash kernel know about the memory region it can use through linux,usable-memory property on the /memory node (overriding its reg property), and about the memory region where the elf core header of the previous kernel is saved, through a reserved-memory node with a compatible string of "linux,elfcorehdr". This approach is the least invasive and re-uses functionality already present. I tested this on riscv64 qemu and it works as expected, you may test it by retrieving the dmesg of the previous kernel through /proc/vmcore, using the vmcore-dmesg utility from kexec-tools. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Add kdump supportNick Kossifidis
This patch adds support for kdump, the kernel will reserve a region for the crash kernel and jump there on panic. In order for userspace tools (kexec-tools) to prepare the crash kernel kexec image, we also need to expose some information on /proc/iomem for the memory regions used by the kernel and for the region reserved for crash kernel. Note that on userspace the device tree is used to determine the system's memory layout so the "System RAM" on /proc/iomem is ignored. I tested this on riscv64 qemu and works as expected, you may test it by triggering a crash through /proc/sysrq_trigger: echo c > /proc/sysrq_trigger Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Improve init_resources()Nick Kossifidis
The kernel region is always present and we know where it is, no need to look for it inside the loop, just ignore it like the rest of the reserved regions within system's memory. Additionally, we don't need to call memblock_free inside the loop, as if called it'll split the region of pre-allocated resources in two parts, messing things up, just re-use the previous pre-allocated resource and free any unused resources after both loops finish. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> [Palmer: commit text] Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26RISC-V: Add kexec supportNick Kossifidis
This patch adds support for kexec on RISC-V. On SMP systems it depends on HOTPLUG_CPU in order to be able to bring up all harts after kexec. It also needs a recent OpenSBI version that supports the HSM extension. I tested it on riscv64 QEMU on both an smp and a non-smp system. Signed-off-by: Nick Kossifidis <mick@ics.forth.gr> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: vdso: fix and clean-up MakefileJisheng Zhang
Running "make" on an already compiled kernel tree will rebuild the kernel even without any modifications: CALL linux/scripts/checksyscalls.sh CALL linux/scripts/atomic/check-atomics.sh CHK include/generated/compile.h SO2S arch/riscv/kernel/vdso/vdso-syms.S AS arch/riscv/kernel/vdso/vdso-syms.o AR arch/riscv/kernel/vdso/built-in.a AR arch/riscv/kernel/built-in.a AR arch/riscv/built-in.a GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o The reason is "Any target that utilizes if_changed must be listed in $(targets), otherwise the command line check will fail, and the target will always be built" as explained by Documentation/kbuild/makefiles.rst Fix this build bug by adding vdso-syms.S to $(targets) At the same time, there are two trivial clean up modifications: - the vdso-dummy.o is not needed any more after so remove it. - vdso.lds is a generated file, so it should be prefixed with $(obj)/ instead of $(src)/ Fixes: c2c81bb2f691 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+") Cc: stable@vger.kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv/mm: Use BUG_ON instead of if condition followed by BUG.zhouchuangao
BUG_ON() uses unlikely in if(), which can be optimized at compile time. Signed-off-by: zhouchuangao <zhouchuangao@vivo.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv/kprobe: fix kernel panic when invoking sys_read traced by kprobeLiao Chang
The execution of sys_read end up hitting a BUG_ON() in __find_get_block after installing kprobe at sys_read, the BUG message like the following: [ 65.708663] ------------[ cut here ]------------ [ 65.709987] kernel BUG at fs/buffer.c:1251! [ 65.711283] Kernel BUG [#1] [ 65.712032] Modules linked in: [ 65.712925] CPU: 0 PID: 51 Comm: sh Not tainted 5.12.0-rc4 #1 [ 65.714407] Hardware name: riscv-virtio,qemu (DT) [ 65.715696] epc : __find_get_block+0x218/0x2c8 [ 65.716835] ra : __getblk_gfp+0x1c/0x4a [ 65.717831] epc : ffffffe00019f11e ra : ffffffe00019f56a sp : ffffffe002437930 [ 65.719553] gp : ffffffe000f06030 tp : ffffffe0015abc00 t0 : ffffffe00191e038 [ 65.721290] t1 : ffffffe00191e038 t2 : 000000000000000a s0 : ffffffe002437960 [ 65.723051] s1 : ffffffe00160ad00 a0 : ffffffe00160ad00 a1 : 000000000000012a [ 65.724772] a2 : 0000000000000400 a3 : 0000000000000008 a4 : 0000000000000040 [ 65.726545] a5 : 0000000000000000 a6 : ffffffe00191e000 a7 : 0000000000000000 [ 65.728308] s2 : 000000000000012a s3 : 0000000000000400 s4 : 0000000000000008 [ 65.730049] s5 : 000000000000006c s6 : ffffffe00240f800 s7 : ffffffe000f080a8 [ 65.731802] s8 : 0000000000000001 s9 : 000000000000012a s10: 0000000000000008 [ 65.733516] s11: 0000000000000008 t3 : 00000000000003ff t4 : 000000000000000f [ 65.734434] t5 : 00000000000003ff t6 : 0000000000040000 [ 65.734613] status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003 [ 65.734901] Call Trace: [ 65.735076] [<ffffffe00019f11e>] __find_get_block+0x218/0x2c8 [ 65.735417] [<ffffffe00020017a>] __ext4_get_inode_loc+0xb2/0x2f6 [ 65.735618] [<ffffffe000201b6c>] ext4_get_inode_loc+0x3a/0x8a [ 65.735802] [<ffffffe000203380>] ext4_reserve_inode_write+0x2e/0x8c [ 65.735999] [<ffffffe00020357a>] __ext4_mark_inode_dirty+0x4c/0x18e [ 65.736208] [<ffffffe000206bb0>] ext4_dirty_inode+0x46/0x66 [ 65.736387] [<ffffffe000192914>] __mark_inode_dirty+0x12c/0x3da [ 65.736576] [<ffffffe000180dd2>] touch_atime+0x146/0x150 [ 65.736748] [<ffffffe00010d762>] filemap_read+0x234/0x246 [ 65.736920] [<ffffffe00010d834>] generic_file_read_iter+0xc0/0x114 [ 65.737114] [<ffffffe0001f5d7a>] ext4_file_read_iter+0x42/0xea [ 65.737310] [<ffffffe000163f2c>] new_sync_read+0xe2/0x15a [ 65.737483] [<ffffffe000165814>] vfs_read+0xca/0xf2 [ 65.737641] [<ffffffe000165bae>] ksys_read+0x5e/0xc8 [ 65.737816] [<ffffffe000165c26>] sys_read+0xe/0x16 [ 65.737973] [<ffffffe000003972>] ret_from_syscall+0x0/0x2 [ 65.738858] ---[ end trace fe93f985456c935d ]--- A simple reproducer looks like: echo 'p:myprobe sys_read fd=%a0 buf=%a1 count=%a2' > /sys/kernel/debug/tracing/kprobe_events echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable cat /sys/kernel/debug/tracing/trace Here's what happens to hit that BUG_ON(): 1) After installing kprobe at entry of sys_read, the first instruction is replaced by 'ebreak' instruction on riscv64 platform. 2) Once kernel reach the 'ebreak' instruction at the entry of sys_read, it trap into the riscv breakpoint handler, where it do something to setup for coming single-step of origin instruction, including backup the 'sstatus' in pt_regs, followed by disable interrupt during single stepping via clear 'SIE' bit of 'sstatus' in pt_regs. 3) Then kernel restore to the instruction slot contains two instructions, one is original instruction at entry of sys_read, the other is 'ebreak'. Here it trigger a 'Instruction page fault' exception (value at 'scause' is '0xc'), if PF is not filled into PageTabe for that slot yet. 4) Again kernel trap into page fault exception handler, where it choose different policy according to the state of running kprobe. Because afte 2) the state is KPROBE_HIT_SS, so kernel reset the current kprobe and 'pc' points back to the probe address. 5) Because 'epc' point back to 'ebreak' instrution at sys_read probe, kernel trap into breakpoint handler again, and repeat the operations at 2), however 'sstatus' without 'SIE' is keep at 4), it cause the real 'sstatus' saved at 2) is overwritten by the one withou 'SIE'. 6) When kernel cross the probe the 'sstatus' CSR restore with value without 'SIE', and reach __find_get_block where it requires the interrupt must be enabled. Fix this is very trivial, just restore the value of 'sstatus' in pt_regs with backup one at 2) when the instruction being single stepped cause a page fault. Fixes: c22b0bcb1dd02 ("riscv: Add kprobes supported") Signed-off-by: Liao Chang <liaochang1@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: Set ARCH_HAS_STRICT_MODULE_RWX if MMUJisheng Zhang
Now we can set ARCH_HAS_STRICT_MODULE_RWX for MMU riscv platforms, this is good from security perspective. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: module: Create module allocations without exec permissionsJisheng Zhang
The core code manages the executable permissions of code regions of modules explicitly, it is not necessary to create the module vmalloc regions with RWX permissions. Create them with RW- permissions instead. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: bpf: Avoid breaking W^XJisheng Zhang
We allocate Non-executable pages, then call bpf_jit_binary_lock_ro() to enable executable permission after mapping them read-only. This is to prepare for STRICT_MODULE_RWX in following patch. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: bpf: Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to coreJisheng Zhang
We will drop the executable permissions of the code pages from the mapping at allocation time soon. Move bpf_jit_alloc_exec() and bpf_jit_free_exec() to bpf_jit_core.c so that they can be shared by both RV64I and RV32I. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-04-26riscv: kprobes: Implement alloc_insn_page()Jisheng Zhang
Allocate PAGE_KERNEL_READ_EXEC(read only, executable) page for kprobes insn page. This is to prepare for STRICT_MODULE_RWX. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>