Age | Commit message (Collapse) | Author |
|
Add a __SYSCALL_COMMON() macro to the syscall table, which simplifies syscalltbl.sh.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-12-brgerst@gmail.com
|
|
Syscall qualifier support is no longer needed.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-11-brgerst@gmail.com
|
|
Now that the fast syscall path is removed, the ptregs qualifier is unused.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-10-brgerst@gmail.com
|
|
Instead of using an array in asm-offsets to calculate the max syscall
number, calculate it when writing out the syscall headers.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-9-brgerst@gmail.com
|
|
Since X32 has its own syscall table now, move it to a separate file.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/20200313195144.164260-8-brgerst@gmail.com
|
|
so it can be available to multiple syscall tables. Also directly return
-ENOSYS instead of bouncing to the generic sys_ni_syscall().
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200313195144.164260-7-brgerst@gmail.com
|
|
Add missing syscall wrapper for x32_rt_sigreturn().
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-6-brgerst@gmail.com
|
|
Pull the common code out from the SYS_NI macros into a new __SYS_NI macro.
Also conditionalize the X64 version in preparation for enabling syscall
wrappers on 32-bit native kernels.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-5-brgerst@gmail.com
|
|
Pull the common code out from the COND_SYSCALL macros into a new
__COND_SYSCALL macro. Also conditionalize the X64 version in preparation
for enabling syscall wrappers on 32-bit native kernels.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-4-brgerst@gmail.com
|
|
Pull the common code out from the SYSCALL_DEFINE0 macros into a new
__SYS_STUB0 macro. Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-3-brgerst@gmail.com
|
|
Pull the common code out from the SYSCALL_DEFINEx macros into a new
__SYS_STUBx macro. Also conditionalize the X64 version in preparation for
enabling syscall wrappers on 32-bit native kernels.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200313195144.164260-2-brgerst@gmail.com
|
|
User space cannot disable interrupts any longer so trace return to user space
unconditionally as IRQS_ON.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200308222609.314596327@linutronix.de
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/20200308222609.219366430@linutronix.de
|
|
Nothing cares about the -1 "mark as interrupt" in the errorcode of
exception entries. It's only used to fill the error code when a signal is
delivered, but this is already inconsistent vs. 64 bit as there all
exceptions which do not have an error code set it to 0. So if 32 bit
applications would care about this, then they would have noticed more than
a decade ago.
Just use 0 for all excpetions which do not have an errorcode consistently.
This does neither break /proc/$PID/syscall because this interface examines
the error code / syscall number which is on the stack and that is set to -1
(no syscall) in common_exception unconditionally for all exceptions. The
push in the entry stub is just there to fill the hardware error code slot
on the stack for consistency of the stack layout.
A transient observation of 0 is possible, but that's true for the other
exceptions which use 0 already as well and that interface is an unreliable
snapshot of dubious correctness anyway.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/87mu94m7ky.fsf@nanos.tec.linutronix.de
|
|
#BP is not longer using IST and using ist_enter() and ist_exit() makes it
harder to change ist_enter() and ist_exit()'s behavior. Instead open-code
the very small amount of required logic.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220217.150607679@linutronix.de
|
|
int3 is not using the common_exception path for purely historical reasons,
but there is no reason to keep it the only exception which is different.
Make it use common_exception so the upcoming changes to autogenerate the
entry stubs do not have to special case int3.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220217.042369808@linutronix.de
|
|
Nothing is using it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.826870369@linutronix.de
|
|
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.720335354@linutronix.de
|
|
Add a comment which explains why this empty handler for a reserved vector
exists.
Requested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.624165786@linutronix.de
|
|
That function returns immediately after conditionally reenabling interrupts which
is more than pointless and requires the ASM code to disable interrupts again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20191023123117.871608831@linutronix.de
Link: https://lkml.kernel.org/r/20200225220216.518575042@linutronix.de
|
|
Remove the pointless difference between 32 and 64 bit to make further
unifications simpler.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.428188397@linutronix.de
|
|
do_machine_check() can be raised in almost any context including the most
fragile ones. Prevent kprobes and tracing.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200225220216.315548935@linutronix.de
|
|
All exception entry points must have ASM_CLAC right at the
beginning. The general_protection entry is missing one.
Fixes: e59d1b0a2419 ("x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20200225220216.219537887@linutronix.de
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"These are fixes that were found during testing with help of error
injection, plus some other stable material.
There's a fixup to patch added to rc1 causing locking in wrong context
warnings, tests found one more deadlock scenario. The patches are
tagged for stable, two of them now in the queue but we'd like all
three released at the same time.
I'm not happy about fixes to fixes in such a fast succession during
rcs, but I hope we found all the fallouts of commit 28553fa992cb
('Btrfs: fix race between shrinking truncate and fiemap')"
* tag 'for-5.6-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
Btrfs: fix deadlock during fast fsync when logging prealloc extents beyond eof
Btrfs: fix btrfs_wait_ordered_range() so that it waits for all ordered extents
btrfs: fix bytes_may_use underflow in prealloc error condtition
btrfs: handle logged extent failure properly
btrfs: do not check delayed items are empty for single transaction cleanup
btrfs: reset fs_root to NULL on error in open_ctree
btrfs: destroy qgroup extent records on transaction abort
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"More miscellaneous ext4 bug fixes (all stable fodder)"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix mount failure with quota configured as module
jbd2: fix ocfs2 corrupt when clearing block group bits
ext4: fix race between writepages and enabling EXT4_EXTENTS_FL
ext4: rename s_journal_flag_rwsem to s_writepages_rwsem
ext4: fix potential race between s_flex_groups online resizing and access
ext4: fix potential race between s_group_info online resizing and access
ext4: fix potential race between online resizing and write operations
ext4: add cond_resched() to __ext4_find_entry()
ext4: fix a data race in EXT4_I(inode)->i_disksize
|
|
Pull csky updates from Guo Ren:
"Sorry, I missed 5.6-rc1 merge window, but in this pull request the
most are the fixes and the rests are between fixes and features. The
only outside modification is the MAINTAINERS file update with our
mailing list.
- cache flush implementation fixes
- ftrace modify panic fix
- CONFIG_SMP boot problem fix
- fix pt_regs saving for atomic.S
- fix fixaddr_init without highmem.
- fix stack protector support
- fix fake Tightly-Coupled Memory code compile and use
- fix some typos and coding convention"
* tag 'csky-for-linus-5.6-rc3' of git://github.com/c-sky/csky-linux: (23 commits)
csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>
csky: Implement copy_thread_tls
csky: Add PCI support
csky: Minimize defconfig to support buildroot config.fragment
csky: Add setup_initrd check code
csky: Cleanup old Kconfig options
arch/csky: fix some Kconfig typos
csky: Fixup compile warning for three unimplemented syscalls
csky: Remove unused cache implementation
csky: Fixup ftrace modify panic
csky: Add flush_icache_mm to defer flush icache all
csky: Optimize abiv2 copy_to_user_page with VM_EXEC
csky: Enable defer flush_dcache_page for abiv2 cpus (807/810/860)
csky: Remove unnecessary flush_icache_* implementation
csky: Support icache flush without specific instructions
csky/Kconfig: Add Kconfig.platforms to support some drivers
csky/smp: Fixup boot failed when CONFIG_SMP
csky: Set regs->usp to kernel sp, when the exception is from kernel
csky/mm: Fixup export invalid_pte_table symbol
csky: Separate fixaddr_init from highmem
...
|
|
The C-Sky platform code is not a clock provider, and just needs to call
of_clk_init().
Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS fixes from Thomas Gleixner:
"Two fixes for the AMD MCE driver:
- Populate the per CPU MCA bank descriptor pointer only after it has
been completely set up to prevent a use-after-free in case that one
of the subsequent initialization step fails
- Implement a proper release function for the sysfs entries of MCA
threshold controls instead of freeing the memory right in the CPU
teardown code, which leads to another use-after-free when the
associated sysfs file is opened and accessed"
* tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce/amd: Fix kobject lifetime
x86/mce/amd: Publish the bank pointer only after setup has succeeded
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two fixes for the irq core code which are follow ups to the recent MSI
fixes:
- The WARN_ON which was put into the MSI setaffinity callback for
paranoia reasons actually triggered via a callchain which escaped
when all the possible ways to reach that code were analyzed.
The proc/irq/$N/*affinity interfaces have a quirk which came in
when ALPHA moved to the generic interface: In case that the written
affinity mask does not contain any online CPU it calls into ALPHAs
magic auto affinity setting code.
A few years later this mechanism was also made available to x86 for
no good reasons and in a way which circumvents all sanity checks
for interrupts which cannot have their affinity set from process
context on X86 due to the way the X86 interrupt delivery works.
It would be possible to make this work properly, but there is no
point in doing so. If the interrupt is not yet started then the
affinity setting has no effect and if it is started already then it
is already assigned to an online CPU so there is no point to
randomly move it to some other CPU. Just return EINVAL as the code
has done before that change forever.
- The new MSI quirk bit in the irq domain flags turned out to be
already occupied, which escaped the author and the reviewers
because the already in use bits were 0,6,2,3,4,5 listed in that
order.
That bit 6 was simply overlooked because the ordering was straight
forward linear otherwise. So the new bit ended up being a
duplicate.
Fix it up by switching the oddball 6 to the obvious 1"
* tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/irqdomain: Make sure all irq domain flags are distinct
genirq/proc: Reject invalid affinity masks (again)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two fixes for x86:
- Remove the __force_oder definiton from the kaslr boot code as it is
already defined in the page table code which makes GCC 10 builds
fail because it changed the default to -fno-common.
- Address the AMD erratum 1054 concerning the IRPERF capability and
enable the Instructions Retired fixed counter on machines which are
not affected by the erratum"
* tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF
x86/boot/compressed: Don't declare __force_order in kaslr_64.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs fix from Damien Le Moal:
"A single patch fixing typos in the documentation file"
* tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: fix documentation typos etc.
|
|
Pull io_uring fixes from Jens Axboe:
"Here's a small collection of fixes that were queued up:
- Remove unnecessary NULL check (Dan)
- Missing io_req_cancelled() call in fallocate (Pavel)
- Put the cleanup check for aux data in the right spot (Pavel)
- Two fixes for SQPOLL (Stefano, Xiaoguang)"
* tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
io_uring: fix __io_iopoll_check deadlock in io_sq_thread
io_uring: prevent sq_thread from spinning when it should stop
io_uring: fix use-after-free by io_cleanup_req()
io_uring: remove unnecessary NULL checks
io_uring: add missing io_req_cancelled()
|
|
Pull block fixes from Jens Axboe:
"Just a set of NVMe fixes via Keith"
* tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block:
nvme-multipath: Fix memory leak with ana_log_buf
nvme: Fix uninitialized-variable warning
nvme-pci: Use single IRQ vector for old Apple models
nvme/pci: Add sleep quirk for Samsung and Toshiba drives
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four non-core fixes.
Two are reverts of target fixes which turned out to have unwanted side
effects, one is a revert of an RDMA fix with the same problem and the
final one fixes an incorrect warning about memory allocation failures
in megaraid_sas (the driver actually reduces the allocation size until
it succeeds)"
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session"
scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout"
scsi: megaraid_sas: silence a warning
scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix crash in w83627ehf driver seen with W83627DHG-P
- Fix lockdep splat in acpi_power_meter driver
- Fix xdpe12284 documentation Sphinx warnings
* tag 'hwmon-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (w83627ehf) Fix crash seen with W83627DHG-P
hwmon: (acpi_power_meter) Fix lockdep splat
Documentation/hwmon: fix xdpe12284 Sphinx warnings
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree fixes deom Rob Herring:
"A handful of fixes in DT bindings for MDIO bus, Allwinner CSI, OMAP
HSMMC, and Tegra124 EMC"
* tag 'devicetree-fixes-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: media: csi: Fix clocks description
dt-bindings: media: csi: Add interconnects properties
dt-bindings: net: mdio: remove compatible string from example
dt-bindings: memory-controller: Update example for Tegra124 EMC
dt-bindings: mmc: omap-hsmmc: Fix SDIO interrupt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Remove ieee_emulation_warnings sysctl which is a dead code.
- Avoid triggering rebuild of the kernel during make install.
- Enable protected virtualization guest support in default configs.
- Fix cio_ignore seq_file .next function to increase position index.
And use kobj_to_dev instead of container_of in cio code.
- Fix storage block address lists to contain absolute addresses in qdio
code.
- Few clang warnings and spelling fixes.
* tag 's390-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/qdio: fill SBALEs with absolute addresses
s390/qdio: fill SL with absolute addresses
s390: remove obsolete ieee_emulation_warnings
s390: make 'install' not depend on vmlinux
s390/kaslr: Fix casts in get_random
s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range
s390/pkey/zcrypt: spelling s/crytp/crypt/
s390/cio: use kobj_to_dev() API
s390/defconfig: enable CONFIG_PROTECTED_VIRTUALIZATION_GUEST
s390/cio: cio_ignore_proc_seq_next should increase position index
|
|
Since commit a3a0e43fd770 ("io_uring: don't enter poll loop if we have
CQEs pending"), if we already events pending, we won't enter poll loop.
In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has
been terminated and don't reap pending events which are already in cq
ring, and there are some reqs in poll_list, io_sq_thread will enter
__io_iopoll_check(), and find pending events, then return, this loop
will never have a chance to exit.
I have seen this issue in fio stress tests, to fix this issue, let
io_sq_thread call io_iopoll_getevents() with argument 'min' being zero,
and remove __io_iopoll_check().
Fixes: a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs pending")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When CONFIG_QFMT_V2 is configured as a module, the test in
ext4_feature_set_ok() fails and so mount of filesystems with quota or
project features fails. Fix the test to use IS_ENABLED macro which
works properly even for modules.
Link: https://lore.kernel.org/r/20200221100835.9332-1-jack@suse.cz
Fixes: d65d87a07476 ("ext4: improve explanation of a mount failure caused by a misconfigured kernel")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
I found a NULL pointer dereference in ocfs2_block_group_clear_bits().
The running environment:
kernel version: 4.19
A cluster with two nodes, 5 luns mounted on two nodes, and do some
file operations like dd/fallocate/truncate/rm on every lun with storage
network disconnection.
The fallocate operation on dm-23-45 caused an null pointer dereference.
The information of NULL pointer dereference as follows:
[577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45.
[577992.878290] Aborting journal on device dm-23-45.
...
[577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46.
[577992.890908] __journal_remove_journal_head: freeing b_committed_data
[577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30
[577992.890918] __journal_remove_journal_head: freeing b_committed_data
[577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30
[577992.890922] __journal_remove_journal_head: freeing b_committed_data
[577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30
[577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30
[577992.890928] __journal_remove_journal_head: freeing b_committed_data
[577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30
[577992.890933] __journal_remove_journal_head: freeing b_committed_data
[577992.890939] __journal_remove_journal_head: freeing b_committed_data
[577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[577992.890950] Mem abort info:
[577992.890951] ESR = 0x96000004
[577992.890952] Exception class = DABT (current EL), IL = 32 bits
[577992.890952] SET = 0, FnV = 0
[577992.890953] EA = 0, S1PTW = 0
[577992.890954] Data abort info:
[577992.890955] ISV = 0, ISS = 0x00000004
[577992.890956] CM = 0, WnR = 0
[577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9
[577992.890960] [0000000000000020] pgd=0000000000000000
[577992.890964] Internal error: Oops: 96000004 [#1] SMP
[577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd)
[577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1
[577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019
[577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO)
[577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2]
[577992.891084] sp : ffff0000c8e2b810
[577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000
[577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70
[577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2
[577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30
[577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000
[577992.891097] x19: ffff000001681638 x18: ffffffffffffffff
[577992.891098] x17: 0000000000000000 x16: ffff000080a03df0
[577992.891100] x15: ffff0000811d9708 x14: 203d207375746174
[577992.891101] x13: 73203a524f525245 x12: 20373439343a6565
[577992.891103] x11: 0000000000000038 x10: 0101010101010101
[577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f
[577992.891109] x7 : 0000000000000000 x6 : 0000000000000080
[577992.891110] x5 : 0000000000000000 x4 : 0000000000000002
[577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00
[577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000
[577992.891116] Call trace:
[577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2]
[577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2]
[577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2]
[577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2]
[577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2]
[577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2]
[577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2]
[577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2]
[577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2]
[577992.891316] vfs_fallocate+0x11c/0x250
[577992.891317] ksys_fallocate+0x54/0x88
[577992.891319] __arm64_sys_fallocate+0x28/0x38
[577992.891323] el0_svc_common+0x78/0x130
[577992.891325] el0_svc_handler+0x38/0x78
[577992.891327] el0_svc+0x8/0xc
My analysis process as follows:
ocfs2_fallocate
__ocfs2_change_file_space
ocfs2_allocate_extents
ocfs2_extend_allocation
ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
ocfs2_insert_extent
ocfs2_do_insert_extent
ocfs2_rotate_tree_right
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2_journal_restart
jbd2__journal_restart
/* handle->h_transaction is NULL,
* is_handle_aborted(handle) is true
*/
handle->h_transaction = NULL;
start_this_handle
return -EROFS;
ocfs2_free_clusters
_ocfs2_free_clusters
_ocfs2_free_suballoc_bits
ocfs2_block_group_clear_bits
ocfs2_journal_access_gd
__ocfs2_journal_access
jbd2_journal_get_undo_access
/* I think jbd2_write_access_granted() will
* return true, because do_get_write_access()
* will return -EROFS.
*/
if (jbd2_write_access_granted(...)) return 0;
do_get_write_access
/* handle->h_transaction is NULL, it will
* return -EROFS here, so do_get_write_access()
* was not called.
*/
if (is_handle_aborted(handle)) return -EROFS;
/* bh2jh(group_bh) is NULL, caused NULL
pointer dereference */
undo_bg = (struct ocfs2_group_desc *)
bh2jh(group_bh)->b_committed_data;
If handle->h_transaction == NULL, then jbd2_write_access_granted()
does not really guarantee that journal_head will stay around,
not even speaking of its b_committed_data. The bh2jh(group_bh)
can be removed after ocfs2_journal_access_gd() and before call
"bh2jh(group_bh)->b_committed_data". So, we should move
is_handle_aborted() check from do_get_write_access() into
jbd2_journal_get_undo_access() and jbd2_journal_get_write_access()
before the call to jbd2_write_access_granted().
Link: https://lore.kernel.org/r/f72a623f-b3f1-381a-d91d-d22a1c83a336@huawei.com
Signed-off-by: Yan Wang <wangyan122@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running
on it, the following warning in ext4_add_complete_io() can be hit:
WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120
Here's a minimal reproducer (not 100% reliable) (root isn't required):
while true; do
sync
done &
while true; do
rm -f file
touch file
chattr -e file
echo X >> file
chattr +e file
done
The problem is that in ext4_writepages(), ext4_should_dioread_nolock()
(which only returns true on extent-based files) is checked once to set
the number of reserved journal credits, and also again later to select
the flags for ext4_map_blocks() and copy the reserved journal handle to
ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set,
the first check can see dioread_nolock disabled while the later one can
see it enabled, causing the reserved handle to unexpectedly be NULL.
Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races
related to doing so as well, fix this by synchronizing changing
EXT4_EXTENTS_FL with ext4_writepages() via the existing
s_writepages_rwsem (previously called s_journal_flag_rwsem).
This was originally reported by syzbot without a reproducer at
https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf,
but now that dioread_nolock is the default I also started seeing this
when running syzkaller locally.
Link: https://lore.kernel.org/r/20200219183047.47417-3-ebiggers@kernel.org
Reported-by: syzbot+2202a584a00fffd19fbf@syzkaller.appspotmail.com
Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
In preparation for making s_journal_flag_rwsem synchronize
ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA
flags (rather than just JOURNAL_DATA as it does currently), rename it to
s_writepages_rwsem.
Link: https://lore.kernel.org/r/20200219183047.47417-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
|
During an online resize an array of s_flex_groups structures gets replaced
so it can get enlarged. If there is a concurrent access to the array and
this memory has been reused then this can lead to an invalid memory access.
The s_flex_group array has been converted into an array of pointers rather
than an array of structures. This is to ensure that the information
contained in the structures cannot get out of sync during a resize due to
an accessor updating the value in the old structure after it has been
copied but before the array pointer is updated. Since the structures them-
selves are no longer copied but only the pointers to them this case is
mitigated.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443
Link: https://lore.kernel.org/r/20200221053458.730016-4-tytso@mit.edu
Signed-off-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two small fixes for Xen:
- a fix to avoid warnings with new gcc
- a fix for incorrectly disabled interrupts when calling
_cond_resched()"
* tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Enable interrupts when calling _cond_resched()
x86/xen: Distribute switch variables for initialization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"It's all straightforward apart from the changes to mmap()/mremap() in
relation to their handling of address arguments from userspace with
non-zero tag bits in the upper byte.
The change to brk() is necessary to fix a nasty user-visible
regression in malloc(), but we tightened up mmap() and mremap() at the
same time because they also allow the user to create virtual aliases
by accident. It's much less likely than brk() to matter in practice,
but enforcing the principle of "don't permit the creation of mappings
using tagged addresses" leads to a straightforward ABI without having
to worry about the "but what if a crazy program did foo?" aspect of
things.
Summary:
- Fix regression in malloc() caused by ignored address tags in brk()
- Add missing brackets around argument to untagged_addr() macro
- Fix clang build when using binutils assembler
- Fix silly typo in virtual memory map documentation"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
mm: Avoid creating virtual address aliases in brk()/mmap()/mremap()
docs: arm64: fix trivial spelling enought to enough in memory.rst
arm64: memory: Add missing brackets to untagged_addr() macro
arm64: lse: Fix LSE atomics with LLVM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.6. This is two weeks worth as I was out
sick last week:
- Three fixes for the recently added VMAP_STACK on 32-bit.
- Three fixes related to hugepages on 8xx (32-bit).
- A fix for a bug in our transactional memory handling that could
lead to a kernel crash if we saw a page fault during signal
delivery.
- A fix for a deadlock in our PCI EEH (Enhanced Error Handling) code.
- A couple of other minor fixes.
Thanks to: Christophe Leroy, Erhard F, Frederic Barrat, Gustavo Luiz
Duarte, Larry Finger, Leonardo Bras, Oliver O'Halloran, Sam Bobroff"
* tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S
powerpc/xmon: Fix whitespace handling in getstring()
powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK
powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK
powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss
powerpc/hugetlb: Fix 8M hugepages on 8xx
powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size
powerpc/eeh: Fix deadlock handling dead PHB
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
- mtk_wdt needs RESET_CONTROLLER to build
- da9062 driver fixes:
- fix power management ops
- do not ping the hw during stop()
- add dependency on I2C
* tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: da9062: Add dependency on I2C
watchdog: da9062: fix power management ops
watchdog: da9062: do not ping the hw during stop()
watchdog: fix mtk_wdt.c RESET_CONTROLLER build error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 5.6-rc3.
Also included in here are some updates for some documentation files
that I seem to be maintaining these days.
The driver fixes are:
- small fixes for the habanalabs driver
- fsi driver bugfix
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Documentation/process: Swap out the ambassador for Canonical
habanalabs: patched cb equals user cb in device memset
habanalabs: do not halt CoreSight during hard reset
habanalabs: halt the engines before hard-reset
MAINTAINERS: remove unnecessary ':' characters
fsi: aspeed: add unspecified HAS_IOMEM dependency
COPYING: state that all contributions really are covered by this file
Documentation/process: Change Microsoft contact for embargoed hardware issues
embargoed-hardware-issues: drop Amazon contact as the email address now bounces
Documentation/process: Add Arm contact for embargoed HW issues
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some small staging driver fixes for 5.6-rc3, along with the
removal of an unused/unneeded driver as well.
The android vsoc driver is not needed anymore by anyone, so it was
removed.
The other driver fixes are:
- ashmem bugfixes
- greybus audio driver bugfix
- wireless driver bugfixes and tiny cleanups to error paths
All of these have been in linux-next for a while now with no reported
issues"
* tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723bs: Remove unneeded goto statements
staging: rtl8188eu: Remove some unneeded goto statements
staging: rtl8723bs: Fix potential overuse of kernel memory
staging: rtl8188eu: Fix potential overuse of kernel memory
staging: rtl8723bs: Fix potential security hole
staging: rtl8188eu: Fix potential security hole
staging: greybus: use after free in gb_audio_manager_remove_all()
staging: android: Delete the 'vsoc' driver
staging: rtl8723bs: fix copy of overlapping memory
staging: android: ashmem: Disallow ashmem memory from being remapped
staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
|