Age | Commit message (Collapse) | Author |
|
As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:
crash> bt
PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8"
#0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
#1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
#2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
#3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
#4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
#5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
#6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
[exception RIP: svm_update_pi_irte+227]
RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086
RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001
RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8
RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200
R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
#8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
#9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b
RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020
RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0
R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0
R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
Vmx have been fix this in commit 3a8b0677fc61 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.
Co-developed-by: Yi Liu <liu.yi24@zte.com.cn>
Signed-off-by: Yi Liu <liu.yi24@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has
to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm.
kvm_arch_init_vm also has to undo all the initialization, so
group all the MMU initialization code at the beginning and
handle cleaning up of kvm_page_track_init.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"Assorted bits and pieces"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
aio: drop needless assignment in aio_read()
clean overflow checks in count_mounts() a bit
seq_file: fix NULL pointer arithmetic warning
uml/x86: use x86 load_unaligned_zeropad()
asm/user.h: killed unused macros
constify struct path argument of finish_automount()/do_add_mount()
fs: Remove FIXME comment in generic_write_checks()
|
|
Pull vfs fix from Darrick Wong:
"The erofs developers felt that FIEMAP should handle ranged requests
starting at s_maxbytes by returning EFBIG instead of passing the
filesystem implementation a nonsense 0-byte request.
Not sure why they keep tagging this 'iomap', but the VFS shouldn't be
asking for information about ranges of a file that the filesystem
already declared that it does not support.
- Fix a potential infinite loop in FIEMAP by fixing an off by one
error when comparing the requested range against s_maxbytes"
* tag 'vfs-5.18-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: fix an infinite loop in iomap_fiemap
|
|
Pull xfs fixes from Darrick Wong:
"This fixes multiple problems in the reserve pool sizing functions: an
incorrect free space calculation, a pointless infinite loop, and even
more braindamage that could result in the pool being overfilled. The
pile of patches from Dave fix myriad races and UAF bugs in the log
recovery code that much to our mutual surprise nobody's tripped over.
Dave also fixed a performance optimization that had turned into a
regression.
Dave Chinner is taking over as XFS maintainer starting Sunday and
lasting until 5.19-rc1 is tagged so that I can focus on starting a
massive design review for the (feature complete after five years)
online repair feature. From then on, he and I will be moving XFS to a
co-maintainership model by trading duties every other release.
NOTE: I hope very strongly that the other pieces of the (X)FS
ecosystem (fstests and xfsprogs) will make similar changes to spread
their maintenance load.
Summary:
- Fix an incorrect free space calculation in xfs_reserve_blocks that
could lead to a request for free blocks that will never succeed.
- Fix a hang in xfs_reserve_blocks caused by an infinite loop and the
incorrect free space calculation.
- Fix yet a third problem in xfs_reserve_blocks where multiple racing
threads can overfill the reserve pool.
- Fix an accounting error that lead to us reporting reserved space as
"available".
- Fix a race condition during abnormal fs shutdown that could cause
UAF problems when memory reclaim and log shutdown try to clean up
inodes.
- Fix a bug where log shutdown can race with unmount to tear down the
log, thereby causing UAF errors.
- Disentangle log and filesystem shutdown to reduce confusion.
- Fix some confusion in xfs_trans_commit such that a race between
transaction commit and filesystem shutdown can cause unlogged dirty
inode metadata to be committed, thereby corrupting the filesystem.
- Remove a performance optimization in the log as it was discovered
that certain storage hardware handle async log flushes so poorly as
to cause serious performance regressions. Recent restructuring of
other parts of the logging code mean that no performance benefit is
seen on hardware that handle it well"
* tag 'xfs-5.18-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: drop async cache flushes from CIL commits.
xfs: shutdown during log recovery needs to mark the log shutdown
xfs: xfs_trans_commit() path must check for log shutdown
xfs: xfs_do_force_shutdown needs to block racing shutdowns
xfs: log shutdown triggers should only shut down the log
xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks
xfs: shutdown in intent recovery has non-intent items in the AIL
xfs: aborting inodes on shutdown may need buffer lock
xfs: don't report reserved bnobt space as available
xfs: fix overfilling of reserve pool
xfs: always succeed at setting the reserve pool size
xfs: remove infinite loop when reserving free block pool
xfs: don't include bnobt blocks when reserving free block pool
xfs: document the XFS_ALLOC_AGFL_RESERVE constant
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fix from Palmer Dabbelt:
- Fix the RISC-V section of the generic CPU idle bindings to comply
with the recently tightened DT schema.
* tag 'riscv-for-linus-5.18-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
dt-bindings: Fix phandle-array issues in the idle-states bindings
|
|
Pull block driver fixes from Jens Axboe:
"Followup block driver updates and fixes for the 5.18-rc1 merge window.
In detail:
- NVMe pull request
- Fix multipath hang when disk goes live over reconnect (Anton
Eidelman)
- fix RCU hole that allowed for endless looping in multipath
round robin (Chris Leech)
- remove redundant assignment after left shift (Colin Ian King)
- add quirks for Samsung X5 SSDs (Monish Kumar R)
- fix the read-only state for zoned namespaces with unsupposed
features (Pankaj Raghav)
- use a private workqueue instead of the system workqueue in
nvmet (Sagi Grimberg)
- allow duplicate NSIDs for private namespaces (Sungup Moon)
- expose use_threaded_interrupts read-only in sysfs (Xin Hao)"
- nbd minor allocation fix (Zhang)
- drbd fixes and maintainer addition (Lars, Jakob, Christoph)
- n64cart build fix (Jackie)
- loop compat ioctl fix (Carlos)
- misc fixes (Colin, Dongli)"
* tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block:
drbd: remove check of list iterator against head past the loop body
drbd: remove usage of list iterator variable after loop
nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
MAINTAINERS: add drbd co-maintainer
drbd: fix potential silent data corruption
loop: fix ioctl calls using compat_loop_info
nvme-multipath: fix hang when disk goes live over reconnect
nvme: fix RCU hole that allowed for endless looping in multipath round robin
nvme: allow duplicate NSIDs for private namespaces
nvmet: remove redundant assignment after left shift
nvmet: use a private workqueue instead of the system workqueue
nvme-pci: add quirks for Samsung X5 SSDs
nvme-pci: expose use_threaded_interrupts read-only in sysfs
nvme: fix the read-only state for zoned namespaces with unsupposed features
n64cart: convert bi_disk to bi_bdev->bd_disk fix build
xen/blkfront: fix comment for need_copy
xen-blkback: remove redundant assignment to variable i
|
|
Pull block fixes from Jens Axboe:
"Either fixes or a few additions that got missed in the initial merge
window pull. In detail:
- List iterator fix to avoid leaking value post loop (Jakob)
- One-off fix in minor count (Christophe)
- Fix for a regression in how io priority setting works for an
exiting task (Jiri)
- Fix a regression in this merge window with blkg_free() being called
in an inappropriate context (Ming)
- Misc fixes (Ming, Tom)"
* tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block:
blk-wbt: remove wbt_track stub
block: use dedicated list iterator variable
block: Fix the maximum minor value is blk_alloc_ext_minor()
block: restore the old set_task_ioprio() behaviour wrt PF_EXITING
block: avoid calling blkg_free() in atomic context
lib/sbitmap: allocate sb->map via kvzalloc_node
|
|
Pull io_uring fixes from Jens Axboe:
"A little bit all over the map, some regression fixes for this merge
window, and some general fixes that are stable bound. In detail:
- Fix an SQPOLL memory ordering issue (Almog)
- Accept fixes (Dylan)
- Poll fixes (me)
- Fixes for provided buffers and recycling (me)
- Tweak to IORING_OP_MSG_RING command added in this merge window (me)
- Memory leak fix (Pavel)
- Misc fixes and tweaks (Pavel, me)"
* tag 'for-5.18/io_uring-2022-04-01' of git://git.kernel.dk/linux-block:
io_uring: defer msg-ring file validity check until command issue
io_uring: fail links if msg-ring doesn't succeeed
io_uring: fix memory leak of uid in files registration
io_uring: fix put_kbuf without proper locking
io_uring: fix invalid flags for io_put_kbuf()
io_uring: improve req fields comments
io_uring: enable EPOLLEXCLUSIVE for accept poll
io_uring: improve task work cache utilization
io_uring: fix async accept on O_NONBLOCK sockets
io_uring: remove IORING_CQE_F_MSG
io_uring: add flag for disabling provided buffer recycling
io_uring: ensure recv and recvmsg handle MSG_WAITALL correctly
io_uring: don't recycle provided buffer if punted to async worker
io_uring: fix assuming triggered poll waitqueue is the single poll
io_uring: bump poll refs to full 31-bits
io_uring: remove poll entry from list when canceling all
io_uring: fix memory ordering when SQPOLL thread goes to sleep
io_uring: ensure that fsnotify is always called
io_uring: recycle provided before arming poll
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM integrity shrink crash due to journal entry not being marked
unused.
- Fix DM bio polling to handle possibility that underlying device(s)
return BLK_STS_AGAIN during submission.
- Fix dm_io and dm_target_io flags race condition on Alpha.
- Add some pr_err debugging to help debug cases when DM ioctl structure
is corrupted.
* tag 'for-5.18/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: fix bio polling to handle possibile BLK_STS_AGAIN
dm: fix dm_io and dm_target_io flags race condition on Alpha
dm integrity: set journal entry unused when shrinking device
dm ioctl: log an error if the ioctl structure is corrupted
|
|
As per 39bd2b6a3783 ("dt-bindings: Improve phandle-array schemas"), the
phandle-array bindings have been disambiguated. This fixes the new
RISC-V idle-states bindings to comply with the schema.
Fixes: 1bd524f7e8d8 ("dt-bindings: Add common bindings for ARM and RISC-V idle states")
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
Pull ksmbd updates from Steve French:
- three cleanup fixes
- shorten module load warning
- two documentation fixes
* tag '5.18-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: replace usage of found with dedicated list iterator variable
ksmbd: Remove a redundant zeroing of memory
MAINTAINERS: ksmbd: switch Sergey to reviewer
ksmbd: shorten experimental warning on loading the module
ksmbd: use netif_is_bridge_port
Documentation: ksmbd: update Feature Status table
|
|
Pull more cifs updates from Steve French:
- three fixes for big endian issues in how Persistent and Volatile file
ids were stored
- Various misc. fixes: including some for oops, 2 for ioctls, 1 for
writeback
- cleanup of how tcon (tree connection) status is tracked
- Four changesets to move various duplicated protocol definitions
(defined both in cifs.ko and ksmbd) into smbfs_common/smb2pdu.h
- important performance improvement to use cached handles in some key
compounding code paths (reduces numbers of opens/closes sent in some
workloads)
- fix to allow alternate DFS target to be used to retry on a failed i/o
* tag '5.18-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix NULL ptr dereference in smb2_ioctl_query_info()
cifs: prevent bad output lengths in smb2_ioctl_query_info()
smb3: fix ksmbd bigendian bug in oplock break, and move its struct to smbfs_common
smb3: cleanup and clarify status of tree connections
smb3: move defines for query info and query fsinfo to smbfs_common
smb3: move defines for ioctl protocol header and SMB2 sizes to smbfs_common
[smb3] move more common protocol header definitions to smbfs_common
cifs: fix incorrect use of list iterator after the loop
ksmbd: store fids as opaque u64 integers
cifs: fix bad fids sent over wire
cifs: change smb2_query_info_compound to use a cached fid, if available
cifs: convert the path to utf16 in smb2_query_info_compound
cifs: writeback fix
cifs: do not skip link targets when an I/O fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- Add keep_last_dots mount option to allow access to paths with
trailing dots
- Avoid repetitive volume dirty bit set/clear to improve storage life
time
* tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: do not clear VolumeDirty in writeback
exfat: allow access to paths with trailing dots
|
|
Pull more filesystem folio updates from Matthew Wilcox:
"A mixture of odd changes that didn't quite make it into the original
pull and fixes for things that did. Also the readpages changes had to
wait for the NFS tree to be pulled first.
- Remove ->readpages infrastructure
- Remove AOP_FLAG_CONT_EXPAND
- Move read_descriptor_t to networking code
- Pass the iocb to generic_perform_write
- Minor updates to iomap, btrfs, ext4, f2fs, ntfs"
* tag 'folio-5.18d' of git://git.infradead.org/users/willy/pagecache:
btrfs: Remove a use of PAGE_SIZE in btrfs_invalidate_folio()
ntfs: Correct mark_ntfs_record_dirty() folio conversion
f2fs: Get the superblock from the mapping instead of the page
f2fs: Correct f2fs_dirty_data_folio() conversion
ext4: Correct ext4_journalled_dirty_folio() conversion
filemap: Remove AOP_FLAG_CONT_EXPAND
fs: Pass an iocb to generic_perform_write()
fs, net: Move read_descriptor_t to net.h
fs: Remove read_actor_t
iomap: Simplify is_partially_uptodate a little
readahead: Update comments
mm: remove the skip_page argument to read_pages
mm: remove the pages argument to read_pages
fs: Remove ->readpages address space operation
readahead: Remove read_cache_pages()
|
|
Pull XArray updates from Matthew Wilcox:
- Documentation update
- Fix test-suite build after move of bitmap.h
- Fix xas_create_range() when a large entry is already present
- Fix xas_split() of a shadow entry
* tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray:
XArray: Update the LRU list in xas_split()
XArray: Fix xas_create_range() when multi-order entry present
XArray: Include bitmap.h from xarray.h
XArray: Document the locking requirement for the xa_state
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
"This has a handful of new features:
- Support for CURRENT_STACK_POINTER, which enables some extra stack
debugging for HARDENED_USERCOPY.
- Support for the new SBI CPU idle extension, via cpuidle and suspend
drivers.
- Profiling has been enabled in the defconfigs.
but is mostly fixes and cleanups"
* tag 'riscv-for-linus-5.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
RISC-V: K210 defconfigs: Drop redundant MEMBARRIER=n
RISC-V: defconfig: Drop redundant SBI HVC and earlycon
Documentation: riscv: remove non-existent directory from table of contents
riscv: cpu.c: don't use kernel-doc markers for comments
RISC-V: Enable profiling by default
RISC-V: module: fix apply_r_riscv_rcv_branch_rela typo
RISC-V: Declare per cpu boot data as static
RISC-V: Fix a comment typo in riscv_of_parent_hartid()
riscv: Increase stack size under KASAN
riscv: Fix fill_callchain return value
riscv: dts: canaan: Fix SPI3 bus width
riscv: Rename "sp_in_global" to "current_stack_pointer"
riscv module: remove (NOLOAD)
RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine
dt-bindings: Add common bindings for ARM and RISC-V idle states
cpuidle: Add RISC-V SBI CPU idle driver
cpuidle: Factor-out power domain related code from PSCI domain driver
RISC-V: Add SBI HSM suspend related defines
RISC-V: Add arch functions for non-retentive suspend entry/exit
RISC-V: Rename relocate() and make it global
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Add kretprobes framepointer verification and return address recovery
in stacktrace.
- Support control domain masks on custom zcrypt devices and filter
admin requests.
- Cleanup timer API usage.
- Rework absolute lowcore access helpers.
- Other various small improvements and fixes.
* tag 's390-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (26 commits)
s390/alternatives: avoid using jgnop mnemonic
s390/pci: rename get_zdev_by_bus() to zdev_from_bus()
s390/pci: improve zpci_dev reference counting
s390/smp: use physical address for SIGP_SET_PREFIX command
s390: cleanup timer API use
s390/zcrypt: fix using the correct variable for sizeof()
s390/vfio-ap: fix kernel doc and signature of group notifier functions
s390/maccess: rework absolute lowcore accessors
s390/smp: cleanup control register update routines
s390/smp: cleanup target CPU callback starting
s390/test_unwind: verify __kretprobe_trampoline is replaced
s390/unwind: avoid duplicated unwinding entries for kretprobes
s390/unwind: recover kretprobe modified return address in stacktrace
s390/kprobes: enable kretprobes framepointer verification
s390/test_unwind: extend kretprobe test
s390/ap: adjust whitespace
s390/ap: use insn format for new instructions
s390/alternatives: use insn format for new instructions
s390/alternatives: use instructions instead of byte patterns
s390/traps: improve panic message for translation-specification exception
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd BergmannL
"The introduction of vmap-stack on 32-bit arm caused a regression on a
few omap3/omap4 machines that pass a stack variable into a firmware
interface.
The early pre-ACPI AMD Seattle machines have been broken for a while,
Ard Biesheuvel has a series to bring them back for now.
A few machines with multiple DMA channels used on a device have the
channels in the wrong order according to the binding, which causes a
harmless warning. Reversing the order is easier than fixing the tools
to suppress the warning"
* tag 'soc-fixes-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: ls1046a: Update i2c node dma properties
arm64: dts: ls1043a: Update i2c dma properties
ARM: dts: spear1340: Update serial node properties
ARM: dts: spear13xx: Update SPI dma properties
ARM: OMAP2+: Fix regression for smc calls for vmap stack
dt: amd-seattle: add a description of the CPUs and caches
dt: amd-seattle: disable IPMI controller and some GPIO blocks on B0
dt: amd-seattle: add description of the SATA/CCP SMMUs
dt: amd-seattle: add a description of the PCIe SMMU
dt: amd-seattle: fix PCIe legacy interrupt routing
dt: amd-seattle: upgrade AMD Seattle XGBE to new SMMU binding
dt: amd-seattle: remove Overdrive revision A0 support
dt: amd-seattle: remove Husky platform
|
|
Convert the tracepoint.py file to python3 as many of the files in
tools/perf are already written in python3.
Committer testing:
# export PYTHONPATH=/tmp/build/perf/python/
# python3 ~acme/git/perf/tools/perf/python/tracepoint.py | head
time 67394457376909 prev_comm=swapper/12 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=gnome-terminal- next_pid=3313 next_prio=120
time 67394457807669 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
time 67394457811859 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
time 67394457824929 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
time 67394457831899 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
time 67394457842299 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
time 67394457844179 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
time 67394457853879 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
time 67394457856339 prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=0x0 ==> next_comm=python3 next_pid=1485930 next_prio=120
time 67394457865659 prev_comm=python3 prev_pid=1485930 prev_prio=120 prev_state=0x1 ==> next_comm=swapper/13 next_pid=0 next_prio=120
Traceback (most recent call last):
File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 48, in <module>
main()
File "/var/home/acme/git/perf/tools/perf/python/tracepoint.py", line 37, in main
print("time %u prev_comm=%s prev_pid=%d prev_prio=%d prev_state=0x%x ==> next_comm=%s next_pid=%d next_prio=%d" % (
BrokenPipeError: [Errno 32] Broken pipe
#
Signed-off-by: Tanu M <tanu235m@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/linux-perf-users/CAPS78prawYzRZnyhWjgOnGw4EwoswNwztvfZFdCOPOydFzVwzQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Addresses this coccinelle warning:
./tools/perf/util/evlist.c:1333:5-8: Unneeded variable: "err". Return
"- ENOMEM" on line 1358
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/1648432532-23151-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
perf_cpu_map__merge() will reuse one of its arguments if they are equal or
the other argument is NULL.
The arguments could be reused if it is known one set of values is a
subset of the other.
For example, a map of 0-1 and a map of just 0 when merged yields the map
of 0-1.
Currently a new map is created rather than adding a reference count to
the original 0-1 map.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Returns true if the second argument is a subset of the first.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.
For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.
For an uncore event, all_cpus may be just CPU 0 or every online CPU.
This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.
To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
This essentially reverts commit c72e3f04b45fb2e5 ("tools/perf/build:
Speed up git-version test on re-make") and commit 4e666cdb06eede20
("perf tools: Fix dependency for version file creation")
In commit c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version test
on re-make"), a makefile dependency on .git/HEAD was added. The
background is that running PERF-VERSION-FILE is relatively slow, and
commands like "git describe" are particularly slow.
In commit 4e666cdb06eede20 ("perf tools: Fix dependency for version file
creation"), an additional dependency on .git/ORIG_HEAD was added, as
.git/HEAD may not change for "git reset --hard HEAD^" command. However,
depending on whether we're on a branch or not, a "git cherry-pick" may
not lead to the version being updated.
As discussed with the git community in [0], using git internal files for
dependencies is not reliable. Commit 4e666cdb06ee also breaks some build
scenarios [1].
As mentioned, c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version
test on re-make") was added to speed up the build. However in commit
7572733b84997d23 ("perf tools: Fix version kernel tag") we removed the
call to "git describe", so just revert Makefile.perf back to same as pre
c72e3f04b45fb2e5 ("tools/perf/build: Speed up git-version test on
re-make") and the build should not be so slow, as below:
Pre 7572733b8499:
$> time util/PERF-VERSION-GEN
PERF_VERSION = 5.17.rc8.g4e666cdb06ee
real 0m0.110s
user 0m0.091s
sys 0m0.019s
Post 7572733b8499:
$> time util/PERF-VERSION-GEN
PERF_VERSION = 5.17.rc8.g7572733b8499
real 0m0.039s
user 0m0.036s
sys 0m0.007s
[0] https://lore.kernel.org/git/87wngkpddp.fsf@igel.home/T/#m4a4dd6de52fdbe21179306cd57b3761eb07f45f8
[1] https://lore.kernel.org/linux-perf-users/20220329093120.4173283-1-matthieu.baerts@tessares.net/T/#u
Committer testing:
After a fresh rebuild using 'make -C tools/perf O=/tmp/build/perf install-bin':
$ perf -v
perf version 5.17.g162f9db407b6
$ git log --oneline -1
162f9db407b6a6e5 (HEAD -> perf/core) perf tools: Stop depending on .git files for building PERF-VERSION-FILE
$
Now using a detached tarball, i.e. outside the kernel source tree:
$ ls -la perf*tar
ls: cannot access 'perf*tar': No such file or directory
$ make perf-tar-src-pkg
TAR
PERF_VERSION = 5.17.g31d10b3ef133
$ ls -la perf*tar
-rw-r--r--. 1 acme acme 22241280 Mar 30 13:26 perf-5.17.0.tar
$ mv perf-5.17.0.tar /tmp
$ cd /tmp
$ tar xf perf-5.17.0.tar
$ cd perf-5.17.0/
$ make -C tools/perf |& tail
CC util/pmu.o
CC util/pmu-flex.o
CC util/expr-flex.o
CC util/expr.o
LD util/scripting-engines/perf-in.o
LD util/intel-pt-decoder/perf-in.o
LD util/perf-in.o
LD perf-in.o
LINK perf
make: Leaving directory '/tmp/perf-5.17.0/tools/perf'
$ tools/perf/perf -v
perf version 5.17.g31d10b3ef133
$ pwd
/tmp/perf-5.17.0
$ cat PERF-VERSION-FILE
#define PERF_VERSION "5.17.g31d10b3ef133"
$
Fixes: 4e666cdb06eede20 ("perf tools: Fix dependency for version file creation")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1648635774-14581-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes from:
991625f3dd2cbc4b ("x86/ibt: Add IBT feature, MSR and #CP handling")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YkSCx2kr4ambH+Qe@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the changes in:
caa574ffc4aaf4f2 ("drm/i915/uapi: document behaviour for DG2 64K support")
That don't add any new ioctl, so no changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://lore.kernel.org/lkml/YkSChHqaOApscFQ0@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
6d8491910fcd3324 ("KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2")
ef11c9463ae00630 ("KVM: s390: Add vm IOCTL for key checked guest absolute memory access")
e9e9feebcbc14b17 ("KVM: s390: Add optional storage key checking to MEMOP IOCTL")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/YkSCOWHQdir1lhdJ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes from:
34739fd95fab3a5e ("KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field")
583cda1b0e7d5d49 ("KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU")
That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/lkml/YkSB4Q7kWmnaqeZU@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick up the changes in:
991625f3dd2cbc4b ("x86/ibt: Add IBT feature, MSR and #CP handling")
Addressing these tools/perf build warnings:
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
That makes the beautification scripts to pick some new entries:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
--- before 2022-03-29 16:23:07.678740040 -0300
+++ after 2022-03-29 16:23:16.960978524 -0300
@@ -220,6 +220,13 @@
[0x00000669] = "MC6_DEMOTION_POLICY_CONFIG",
[0x00000680] = "LBR_NHM_FROM",
[0x00000690] = "CORE_PERF_LIMIT_REASONS",
+ [0x000006a0] = "IA32_U_CET",
+ [0x000006a2] = "IA32_S_CET",
+ [0x000006a4] = "IA32_PL0_SSP",
+ [0x000006a5] = "IA32_PL1_SSP",
+ [0x000006a6] = "IA32_PL2_SSP",
+ [0x000006a7] = "IA32_PL3_SSP",
+ [0x000006a8] = "IA32_INT_SSP_TAB",
[0x000006B0] = "GFX_PERF_LIMIT_REASONS",
[0x000006B1] = "RING_PERF_LIMIT_REASONS",
[0x000006c0] = "LBR_NHM_TO",
$
And this gets rebuilt:
CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
LD /tmp/build/perf/trace/beauty/perf-in.o
CC /tmp/build/perf/util/amd-sample-raw.o
LD /tmp/build/perf/util/perf-in.o
LD /tmp/build/perf/perf-in.o
LINK /tmp/build/perf/perf
Now one can trace systemwide asking to see backtraces to where those
MSRs are being read/written with:
# perf trace -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
^C#
If we use -v (verbose mode) we can see what it does behind the scenes:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr>=IA32_U_CET && msr<=IA32_INT_SSP_TAB"
Using CPUID AuthenticAMD-25-21-0
0x6a0
0x6a8
New filter for msr:read_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
0x6a0
0x6a8
New filter for msr:write_msr: (msr>=0x6a0 && msr<=0x6a8) && (common_pid != 597499 && common_pid != 3313)
mmap size 528384B
^C#
Example with a frequent msr:
# perf trace -v -e msr:*_msr/max-stack=32/ --filter="msr==IA32_SPEC_CTRL" --max-events 2
Using CPUID AuthenticAMD-25-21-0
0x48
New filter for msr:read_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
0x48
New filter for msr:write_msr: (msr==0x48) && (common_pid != 2612129 && common_pid != 3841)
mmap size 528384B
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
0.000 Timer/2525383 msr:write_msr(msr: IA32_SPEC_CTRL, val: 6)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule ([kernel.kallsyms])
futex_wait_queue_me ([kernel.kallsyms])
futex_wait ([kernel.kallsyms])
do_futex ([kernel.kallsyms])
__x64_sys_futex ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__futex_abstimed_wait_common64 (/usr/lib64/libpthread-2.33.so)
0.030 :0/0 msr:write_msr(msr: IA32_SPEC_CTRL, val: 2)
do_trace_write_msr ([kernel.kallsyms])
do_trace_write_msr ([kernel.kallsyms])
__switch_to_xtra ([kernel.kallsyms])
__switch_to ([kernel.kallsyms])
__schedule ([kernel.kallsyms])
schedule_idle ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/YkNd7Ky+vi7H2Zl2@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes from:
9457056ac426e5ed ("mm: madvise: MADV_DONTNEED_LOCKED")
That result in these changes in the tools:
$ diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
--- tools/include/uapi/asm-generic/mman-common.h 2022-03-29 16:17:50.461694991 -0300
+++ include/uapi/asm-generic/mman-common.h 2022-03-27 19:12:48.923250468 -0300
@@ -75,6 +75,8 @@
#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */
+#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2022-03-29 16:18:04.091044244 -0300
+++ after 2022-03-29 16:18:11.692238906 -0300
@@ -20,6 +20,7 @@
[21] = "PAGEOUT",
[22] = "POPULATE_READ",
[23] = "POPULATE_WRITE",
+ [24] = "DONTNEED_LOCKED",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, 'perf trace' will
be able to translate from the number to a human readable string and to
use the strings in tracepoint filter expressions.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkNcUfeh795yqGMV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
a6a6fe27bab48f0d ("net/smc: Dynamic control handshake limitation by socket options")
This automagically adds support for the SOL_MNC socket level:
$ diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
--- tools/perf/trace/beauty/include/linux/socket.h 2022-03-14 17:55:22.277148656 -0300
+++ include/linux/socket.h 2022-03-27 19:12:48.908250063 -0300
@@ -366,6 +366,7 @@
#define SOL_XDP 283
#define SOL_MPTCP 284
#define SOL_MCTP 285
+#define SOL_SMC 286
/* IPX options */
#define IPX_TYPE 1
$ tools/perf/trace/beauty/socket.sh > before
$ cp include/linux/socket.h tools/perf/trace/beauty/include/linux/socket.h
$ tools/perf/trace/beauty/socket.sh > after
$ diff -u before after
--- before 2022-03-29 11:47:56.390258780 -0300
+++ after 2022-03-29 11:48:03.158436189 -0300
@@ -67,6 +67,7 @@
[283] = "XDP",
[284] = "MPTCP",
[285] = "MCTP",
+ [286] = "SMC",
};
DEFINE_STRARRAY(socket_level, "SOL_");
$
This will allow 'perf trace' to translate 286 into "SMC" as is done with
the other socket levels:
# perf trace -e setsockopt --max-events 4
344.916 ( 0.003 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 5, optval: 0x7f5797b9c4f8, optlen: 4) = 0
344.920 ( 0.002 ms): Socket Thread/3816 setsockopt(fd: 168, level: TCP, optname: 6, optval: 0x7f5797b9c4f4, optlen: 4) = 0
1246.974 ( 0.010 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 11, optval: 0x7ffc96cd7244, optlen: 4) = 0
1246.986 ( 0.002 ms): systemd-resolv/1128 setsockopt(fd: 22, level: IP, optname: 8, optval: 0x7ffc96cd7264, optlen: 4) = 0
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h'
diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: D. Wythe <alibuda@linux.alibaba.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMdpzzjPu5VZtW3@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
To pick the changes in:
fba60b171a032283 ("libbpf: Use IS_ERR_OR_NULL() in hashmap__free()")
That don't entail any changes in tools/perf.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mauricio Vásquez <mauricio@kinvolk.io>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMb2SAIai2VeuUD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Passing NULL to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328062414.1893550-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Merge still more updates from Andrew Morton:
"16 patches.
Subsystems affected by this patch series: ofs2, nilfs2, mailmap, and
mm (madvise, mlock, mfence, memory-failure, kasan, debug, kmemleak,
and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/damon: prevent activated scheme from sleeping by deactivated schemes
mm/kmemleak: reset tag when compare object pointer
doc/vm/page_owner.rst: remove content related to -c option
tools/vm/page_owner_sort.c: remove -c option
mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP
mm,hwpoison: unmap poisoned page before invalidation
mailmap: update Kirill's email
mm: kfence: fix objcgs vector allocation
mm/munlock: protect the per-CPU pagevec by a local_lock_t
mm/munlock: update Documentation/vm/unevictable-lru.rst
mm/munlock: add lru_add_drain() to fix memcg_stat_test
nilfs2: get rid of nilfs_mapping_init()
nilfs2: fix lockdep warnings during disk space reclamation
nilfs2: fix lockdep warnings in page operations for btree nodes
ocfs2: fix crash when mount with quota enabled
Revert "mm: madvise: skip unmapped vma holes passed to process_madvise"
|
|
In the DAMON, the minimum wait time of the schemes decides whether the
kernel wakes up 'kdamon_fn()'. But since the minimum wait time is
initialized to zero, there are corner cases against the original
objective.
For example, if we have several schemes for one target, and if the wait
time of the first scheme is zero, the minimum wait time will set zero,
which means 'kdamond_fn()' should wake up to apply this scheme.
However, in the following scheme, wait time can be set to non-zero.
Thus, the mininum wait time will be set to non-zero, which can cause
sleeping this interval for 'kdamon_fn()' due to one deactivated last
scheme.
This commit prevents making DAMON monitoring inactive state due to other
deactivated schemes.
Link: https://lkml.kernel.org/r/20220330105302.32114-1-tome01@ajou.ac.kr
Signed-off-by: Jonghyeon Kim <tome01@ajou.ac.kr>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When we use HW-tag based kasan and enable vmalloc support, we hit the
following bug. It is due to comparison between tagged object and
non-tagged pointer.
We need to reset the kasan tag when we need to compare tagged object and
non-tagged pointer.
kmemleak: [name:kmemleak&]Scan area larger than object 0xffffffe77076f440
CPU: 4 PID: 1 Comm: init Tainted: G S W 5.15.25-android13-0-g5cacf919c2bc #1
Hardware name: MT6983(ENG) (DT)
Call trace:
add_scan_area+0xc4/0x244
kmemleak_scan_area+0x40/0x9c
layout_and_allocate+0x1e8/0x288
load_module+0x2c8/0xf00
__se_sys_finit_module+0x190/0x1d0
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x60/0x170
el0_svc_common+0xc8/0x114
do_el0_svc+0x28/0xa0
el0_svc+0x60/0xf8
el0t_64_sync_handler+0x88/0xec
el0t_64_sync+0x1b4/0x1b8
kmemleak: [name:kmemleak&]Object 0xf5ffffe77076b000 (size 32768):
kmemleak: [name:kmemleak&] comm "init", pid 1, jiffies 4294894197
kmemleak: [name:kmemleak&] min_count = 0
kmemleak: [name:kmemleak&] count = 0
kmemleak: [name:kmemleak&] flags = 0x1
kmemleak: [name:kmemleak&] checksum = 0
kmemleak: [name:kmemleak&] backtrace:
module_alloc+0x9c/0x120
move_module+0x34/0x19c
layout_and_allocate+0x1c4/0x288
load_module+0x2c8/0xf00
__se_sys_finit_module+0x190/0x1d0
__arm64_sys_finit_module+0x20/0x30
invoke_syscall+0x60/0x170
el0_svc_common+0xc8/0x114
do_el0_svc+0x28/0xa0
el0_svc+0x60/0xf8
el0t_64_sync_handler+0x88/0xec
el0t_64_sync+0x1b4/0x1b8
Link: https://lkml.kernel.org/r/20220318034051.30687-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Nicholas Tang <nicholas.tang@mediatek.com>
Cc: Yee Lee <yee.lee@mediatek.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
-c option has been removed from page_owner_sort.c.
Remove the usage of -c option from Documentation.
This work is coauthored by
Shenghong Han
Yixuan Cao
Chongxi Zhao
Jiajian Ye
Yuhong Feng
Yongqiang Liu
Link: https://lkml.kernel.org/r/20220326085920.1470081-2-zhangyinan2019@email.szu.edu.cn
Signed-off-by: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Georgi Djakov <georgi.djakov@linaro.org>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The -c option is used to cull by stacktrace. Now, --cull option has
been Added in page_owner_sort.c. Culling by stacktrace is one of the
function of "--cull". No need to set an extra parameter. So remove -c
option.
Remove parsing of -c when parse parameter and remove "-c" from usage.
This work is coauthored by
Shenghong Han
Yixuan Cao
Chongxi Zhao
Jiajian Ye
Yuhong Feng
Yongqiang Liu
Link: https://lkml.kernel.org/r/20220326085920.1470081-1-zhangyinan2019@email.szu.edu.cn
Signed-off-by: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Georgi Djakov <georgi.djakov@linaro.org>
Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
KASAN changes that added new GFP flags mistakenly updated
__GFP_BITS_SHIFT as the total number of GFP bits instead of as a shift
used to define __GFP_BITS_MASK.
This broke LOCKDEP, as __GFP_BITS_MASK now gets the 25th bit enabled
instead of the 28th for __GFP_NOLOCKDEP.
Update __GFP_BITS_SHIFT to always count KASAN GFP bits.
In the future, we could handle all combinations of KASAN and LOCKDEP to
occupy as few bits as possible. For now, we have enough GFP bits to be
inefficient in this quick fix.
Link: https://lkml.kernel.org/r/462ff52742a1fcc95a69778685737f723ee4dfb3.1648400273.git.andreyknvl@google.com
Fixes: 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS")
Fixes: 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS")
Fixes: f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In some cases it appears the invalidation of a hwpoisoned page fails
because the page is still mapped in another process. This can cause a
program to be continuously restarted and die when it page faults on the
page that was not invalidated. Avoid that problem by unmapping the
hwpoisoned page when we find it.
Another issue is that sometimes we end up oopsing in finish_fault, if
the code tries to do something with the now-NULL vmf->page. I did not
hit this error when submitting the previous patch because there are
several opportunities for alloc_set_pte to bail out before accessing
vmf->page, and that apparently happened on those systems, and most of
the time on other systems, too.
However, across several million systems that error does occur a handful
of times a day. It can be avoided by returning VM_FAULT_NOPAGE which
will cause do_read_fault to return before calling finish_fault.
Link: https://lkml.kernel.org/r/20220325161428.5068d97e@imladris.surriel.com
Fixes: e53ac7374e64 ("mm: invalidate hwpoison page cache page in fault path")
Signed-off-by: Rik van Riel <riel@surriel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
My new email address is kirill.tkhai@openvz.org.
Link: https://lkml.kernel.org/r/164846762354.278960.13129571556274098855.stgit@pro
Signed-off-by: Kirill Tkhai <kirill.tkhai@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the kfence object is allocated to be used for objects vector, then
this slot of the pool eventually being occupied permanently since the
vector is never freed. The solutions could be (1) freeing vector when
the kfence object is freed or (2) allocating all vectors statically.
Since the memory consumption of object vectors is low, it is better to
chose (2) to fix the issue and it is also can reduce overhead of vectors
allocating in the future.
Link: https://lkml.kernel.org/r/20220328132843.16624-1-songmuchun@bytedance.com
Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The access to mlock_pvec is protected by disabling preemption via
get_cpu_var() or implicit by having preemption disabled by the caller
(in mlock_page_drain() case). This breaks on PREEMPT_RT since
folio_lruvec_lock_irq() acquires a sleeping lock in this section.
Create struct mlock_pvec which consits of the local_lock_t and the
pagevec. Acquire the local_lock() before accessing the per-CPU pagevec.
Replace mlock_page_drain() with a _local() version which is invoked on
the local CPU and acquires the local_lock_t and a _remote() version
which uses the pagevec from a remote CPU which offline.
Link: https://lkml.kernel.org/r/YjizWi9IY0mpvIfb@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update Documentation/vm/unevictable-lru.rst to reflect the changes made
by the mm/munlock series: keeping an mlock_count instead of page_mlock()
(formerly try_to_munlock()) and munlock_vma_pages_all() etc. Also make
other little updates or cleanups wherever noticed.
But, I apologize, this is already out of date, in that "folio" appears
nowhere: 5.18 will be in a transitional state from "page" to "folio",
and documenting its current mix of the two does not help to understand
"the Unevictable LRU". Should be revisited when naming is more settled.
Link: https://lkml.kernel.org/r/3753962-d491-bf60-f59f-51bfe84fd6a0@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Hildenbrand <david@redhat.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mike reports that LTP memcg_stat_test usually leads to
memcg_stat_test 3 TINFO: Test unevictable with MAP_LOCKED
memcg_stat_test 3 TINFO: Running memcg_process --mmap-lock1 -s 135168
memcg_stat_test 3 TINFO: Warming up pid: 3460
memcg_stat_test 3 TINFO: Process is still here after warm up: 3460
memcg_stat_test 3 TFAIL: unevictable is 122880, 135168 expected
but may also lead to
memcg_stat_test 4 TINFO: Test unevictable with mlock
memcg_stat_test 4 TINFO: Running memcg_process --mmap-lock2 -s 135168
memcg_stat_test 4 TINFO: Warming up pid: 4271
memcg_stat_test 4 TINFO: Process is still here after warm up: 4271
memcg_stat_test 4 TFAIL: unevictable is 122880, 135168 expected
or both. A wee bit flaky.
follow_page_pte() used to have an lru_add_drain() per each page mlocked,
and the test came to rely on accurate stats. The pagevec to be drained
is different now, but still covered by lru_add_drain(); and, never mind
the test, I believe it's in everyone's interest that a bulk faulting
interface like populate_vma_page_range() or faultin_vma_page_range()
should drain its local pagevecs at the end, to save others sometimes
needing the much more expensive lru_add_drain_all().
This does not absolutely guarantee exact stats - the mlocking task can
be migrated between CPUs as it proceeds - but it's good enough and the
tests pass.
Link: https://lkml.kernel.org/r/47f6d39c-a075-50cb-1cfb-26dd957a48af@google.com
Fixes: b67bf49ce7aa ("mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Mike Galbraith <efault@gmx.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After applying the lockdep warning fixes, nilfs_mapping_init() is no
longer used, so delete it.
Link: https://lkml.kernel.org/r/1647867427-30498-4-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Sun <sunhao.th@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During disk space reclamation, nilfs2 still emits the following lockdep
warning due to page/folio operations on shadowed page caches that nilfs2
uses to get a snapshot of DAT file in memory:
WARNING: CPU: 0 PID: 2643 at include/linux/backing-dev.h:272 __folio_mark_dirty+0x645/0x670
...
RIP: 0010:__folio_mark_dirty+0x645/0x670
...
Call Trace:
filemap_dirty_folio+0x74/0xd0
__set_page_dirty_nobuffers+0x85/0xb0
nilfs_copy_dirty_pages+0x288/0x510 [nilfs2]
nilfs_mdt_save_to_shadow_map+0x50/0xe0 [nilfs2]
nilfs_clean_segments+0xee/0x5d0 [nilfs2]
nilfs_ioctl_clean_segments.isra.19+0xb08/0xf40 [nilfs2]
nilfs_ioctl+0xc52/0xfb0 [nilfs2]
__x64_sys_ioctl+0x11d/0x170
This fixes the remaining warning by using inode objects to hold those
page caches.
Link: https://lkml.kernel.org/r/1647867427-30498-3-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "nilfs2 lockdep warning fixes".
The first two are to resolve the lockdep warning issue, and the last one
is the accompanying cleanup and low priority.
Based on your comment, this series solves the issue by separating inode
object as needed. Since I was worried about the impact of the object
composition changes, I tested the series carefully not to cause
regressions especially for delicate functions such like disk space
reclamation and snapshots.
This patch (of 3):
If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at
inode_to_wb() during page/folio operations for btree nodes:
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline]
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline]
WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
Modules linked in:
...
RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline]
RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline]
RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
...
Call Trace:
__set_page_dirty include/linux/pagemap.h:834 [inline]
mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145
nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline]
nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085
nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625
nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009
nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048
nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline]
nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline]
nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036
nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline]
nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563
kthread+0x405/0x4f0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
This is because nilfs2 uses two page caches for each inode and
inode->i_mapping never points to one of them, the btree node cache.
This causes inode_to_wb(inode) to refer to a different page cache than
the caller page/folio operations such like __folio_start_writeback(),
__folio_end_writeback(), or __folio_mark_dirty() acquired the lock.
This patch resolves the issue by allocating and using an additional
inode to hold the page cache of btree nodes. The inode is attached
one-to-one to the traditional nilfs2 inode if it requires a block
mapping with b-tree. This setup change is in memory only and does not
affect the disk format.
Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com
Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org
Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com
Reported-by: syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com
Reported-by: Hao Sun <sunhao.th@gmail.com>
Reported-by: David Hildenbrand <david@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a reported crash when mounting ocfs2 with quota enabled.
RIP: 0010:ocfs2_qinfo_lock_res_init+0x44/0x50 [ocfs2]
Call Trace:
ocfs2_local_read_info+0xb9/0x6f0 [ocfs2]
dquot_load_quota_sb+0x216/0x470
dquot_load_quota_inode+0x85/0x100
ocfs2_enable_quotas+0xa0/0x1c0 [ocfs2]
ocfs2_fill_super.cold+0xc8/0x1bf [ocfs2]
mount_bdev+0x185/0x1b0
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
path_mount+0x465/0xac0
__x64_sys_mount+0x103/0x140
It is caused by when initializing dqi_gqlock, the corresponding dqi_type
and dqi_sb are not properly initialized.
This issue is introduced by commit 6c85c2c72819, which wants to avoid
accessing uninitialized variables in error cases. So make global quota
info properly initialized.
Link: https://lkml.kernel.org/r/20220323023644.40084-1-joseph.qi@linux.alibaba.com
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007141
Fixes: 6c85c2c72819 ("ocfs2: quota_local: fix possible uninitialized-variable access in ocfs2_local_read_info()")
Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: Dayvison <sathlerds@gmail.com>
Tested-by: Valentin Vidic <vvidic@valentin-vidic.from.hr>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|