summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-21xfs: refactor near mode alloc bnobt scan into separate functionBrian Foster
In preparation to enhance the near mode allocation bnobt scan algorithm, lift it into a separate function. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor and reuse best extent scanning logicBrian Foster
The bnobt "find best" helper implements a simple btree walker function. This general pattern, or a subset thereof, is reused in various parts of a near mode allocation operation. For example, the bnobt left/right scans are each iterative btree walks along with the cntbt lastblock scan. Rework this function into a generic btree walker, add a couple parameters to control termination behavior from various contexts and reuse it where applicable. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor allocation tree fixup codeBrian Foster
Both algorithms duplicate the same btree allocation code. Eliminate the duplication and reuse the fallback algorithm codepath. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: reuse best extent tracking logic for bnobt scanBrian Foster
The near mode bnobt scan searches left and right in the bnobt looking for the closest free extent to the allocation hint that satisfies minlen. Once such an extent is found, the left/right search terminates, we search one more time in the opposite direction and finish the allocation with the best overall extent. The left/right and find best searches are currently controlled via a combination of cursor state and local variables. Clean up this code and prepare for further improvements to the near mode fallback algorithm by reusing the allocation cursor best extent tracking mechanism. Update the tracking logic to deactivate bnobt cursors when out of allocation range and replace open-coded extent checks to calls to the common helper. In doing so, rename some misnamed local variables in the top-level near mode allocation function. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor cntbt lastblock scan best extent logic into helperBrian Foster
The cntbt lastblock scan checks the size, alignment, locality, etc. of each free extent in the block and compares it with the current best candidate. This logic will be reused by the upcoming optimized cntbt algorithm, so refactor it into a separate helper. Note that acur->diff is now initialized to -1 (unsigned) instead of 0 to support the more granular comparison logic in the new helper. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track best extent from cntbt lastblock scan in alloc cursorBrian Foster
If the size lookup lands in the last block of the by-size btree, the near mode algorithm scans the entire block for the extent with best available locality. In preparation for similar best available extent tracking across both btrees, extend the allocation cursor with best extent data and lift the associated state from the cntbt last block scan code. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track allocation busy state in allocation cursorBrian Foster
Extend the allocation cursor to track extent busy state for an allocation attempt. No functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: introduce allocation cursor data structureBrian Foster
Introduce a new allocation cursor data structure to encapsulate the various states and structures used to perform an extent allocation. This structure will eventually be used to track overall allocation state across different search algorithms on both free space btrees. To start, include the three btree cursors (one for the cntbt and two for the bnobt left/right search) used by the near mode allocation algorithm and refactor the cursor setup and teardown code into helpers. This slightly changes cursor memory allocation patterns, but otherwise makes no functional changes to the allocation algorithm. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: fix sparse complaints] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: track active state of allocation btree cursorsBrian Foster
The upcoming allocation algorithm update searches multiple allocation btree cursors concurrently. As such, it requires an active state to track when a particular cursor should continue searching. While active state will be modified based on higher level logic, we can define base functionality based on the result of allocation btree lookups. Define an active flag in the private area of the btree cursor. Update it based on the result of lookups in the existing allocation btree helpers. Finally, provide a new helper to query the current state. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: ignore extent size hints for always COW inodesChristoph Hellwig
There is no point in applying extent size hints for always COW inodes, as we would just have to COW any extra allocation beyond the data actually written. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: include QUOTA, FATAL ASSERT build options in XFS_BUILD_OPTIONSyu kuai
In commit d03a2f1b9fa8 ("xfs: include WARN, REPAIR build options in XFS_BUILD_OPTIONS"), Eric pointed out that the XFS_BUILD_OPTIONS string, shown at module init time and in modinfo output, does not currently include all available build options. So, he added in CONFIG_XFS_WARN and CONFIG_XFS_REPAIR. However, this is not enough, add in CONFIG_XFS_QUOTA and CONFIG_XFS_ASSERT_FATAL. Signed-off-by: yu kuai <yukuai3@huawei.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: use a srcmap for a read-modify-write I/OGoldwyn Rodrigues
The srcmap is used to identify where the read is to be performed from. It is passed to ->iomap_begin, which can fill it in if we need to read data for partially written blocks from a different location than the write target. The srcmap is only supported for buffered writes so far. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [hch: merged two patches, removed the IOMAP_F_COW flag, use iomap as srcmap if not set, adjust length down to srcmap end as well] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
2019-10-21iomap: renumber IOMAP_HOLE to 0Christoph Hellwig
Instead of keeping a separate unnamed state for uninitialized iomaps, renumber IOMAP_HOLE to zero so that an uninitialized iomap is treated as a hole. Suggested-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: use write_begin to read pages to unshareChristoph Hellwig
Use the existing iomap write_begin code to read the pages unshared by iomap_file_unshare. That avoids the extra ->readpage call and extent tree lookup currently done by read_mapping_page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: move the zeroing case out of iomap_read_page_syncChristoph Hellwig
That keeps the function a little easier to understand, and easier to modify for pending enhancements. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: ignore non-shared or non-data blocks in xfs_file_dirtyChristoph Hellwig
xfs_file_dirty is used to unshare reflink blocks. Rename the function to xfs_file_unshare to better document that purpose, and skip iomaps that are not shared and don't need zeroing. This will allow to simplify the caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: always use AOP_FLAG_NOFS in iomap_write_beginChristoph Hellwig
All callers pass AOP_FLAG_NOFS, so lift that flag to iomap_write_begin to allow reusing the flags arguments for an internal flags namespace soon. Also remove the local index variable that is only used once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: remove the unused iomap argument to __iomap_write_endChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: better document the IOMAP_F_* flagsChristoph Hellwig
The documentation for IOMAP_F_* is a bit disorganized, and doesn't mention the fact that most flags are set by the file system and consumed by the iomap core, while IOMAP_F_SIZE_CHANGED is set by the core and consumed by the file system. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Collins <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: enhance writeback error messageDarrick J. Wong
If we encounter an IO error during writeback, log the inode, offset, and sector number of the failure, instead of forcing the user to do some sort of reverse mapping to figure out which file is affected. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-10-21iomap: pass a struct page to iomap_finish_page_writebackChristoph Hellwig
No need to pass the full bio_vec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: cleanup iomap_ioend_compareChristoph Hellwig
Move the initialization of ia and ib to the declaration line and remove a superflous else. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: move struct iomap_page out of iomap.hChristoph Hellwig
Now that all the writepage code is in the iomap code there is no need to keep this structure public. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: warn on inline maps in iomap_writepage_mapChristoph Hellwig
And inline mapping should never mark the page dirty and thus never end up in writepages. Add a check for that condition and warn if it happens. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: lift the xfs writeback code to iomapChristoph Hellwig
Take the xfs writeback code and move it to fs/iomap. A new structure with three methods is added as the abstraction from the generic writeback code to the file system. These methods are used to map blocks, submit an ioend, and cancel a page that encountered an error before it was added to an ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> [darrick: rename ->submit_ioend to ->prepare_ioend to clarify what it does] Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-10-21iomap: lift common tracing code from xfs to iomapChristoph Hellwig
Lift the xfs code for tracing address space operations to the iomap layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21iomap: zero newly allocated mapped blocksChristoph Hellwig
File systems like gfs2 don't support delayed allocations or unwritten extents and thus allocate normal mapped blocks to fill holes. To cover the case of such file systems allocating new blocks to fill holes also zero out mapped blocks with the new flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: remove the fork fields in the writepage_ctx and ioendChristoph Hellwig
In preparation for moving the writeback code to iomap.c, replace the XFS-specific COW fork concept with the iomap IOMAP_F_SHARED flag. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: turn io_append_trans into an io_private void pointerChristoph Hellwig
In preparation for moving the ioend structure to common code we need to get rid of the xfs-specific xfs_trans type. Just make it a file system private void pointer instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: refactor the ioend merging codeChristoph Hellwig
Introduce two nicely abstracted helper, which can be moved to the iomap code later. Also use list_first_entry_or_null to simplify the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: use a struct iomap in xfs_writepage_ctxChristoph Hellwig
In preparation for moving the XFS writeback code to fs/iomap.c, switch it to use struct iomap instead of the XFS-specific struct xfs_bmbt_irec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: set IOMAP_F_NEW more carefullyChristoph Hellwig
Don't set IOMAP_F_NEW if we COW over an existing allocated range, as these aren't strictly new allocations. This is required to be able to use IOMAP_F_NEW to zero newly allocated blocks, which is required for the iomap code to fully support file systems that don't do delayed allocations or use unwritten extents. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-21xfs: initialize iomap->flags in xfs_bmbt_to_iomapChristoph Hellwig
Currently we don't overwrite the flags field in the iomap in xfs_bmbt_to_iomap. This works fine with 0-initialized iomaps on stack, but is harmful once we want to be able to reuse an iomap in the writeback code. Replace the shared parameter with a set of initial flags an thus ensures the flags field is always reinitialized. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-17iomap: iomap that extends beyond EOF should be marked dirtyDave Chinner
When doing a direct IO that spans the current EOF, and there are written blocks beyond EOF that extend beyond the current write, the only metadata update that needs to be done is a file size extension. However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that there is IO completion metadata updates required, and hence we may fail to correctly sync file size extensions made in IO completion when O_DSYNC writes are being used and the hardware supports FUA. Hence when setting IOMAP_F_DIRTY, we need to also take into account whether the iomap spans the current EOF. If it does, then we need to mark it dirty so that IO completion will call generic_write_sync() to flush the inode size update to stable storage correctly. Fixes: 3460cac1ca76 ("iomap: Use FUA for pure data O_DSYNC DIO writes") Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> [darrick: removed the ext4 part; they'll handle it separately] Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15xfs: Use iomap_dio_rw to wait for unaligned direct IOJan Kara
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding the wait. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-15iomap: Allow forcing of waiting for running DIO in iomap_dio_rw()Jan Kara
Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-10-13Linux 5.4-rc3v5.4-rc3Linus Torvalds
2019-10-13Merge tag 'trace-v5.4-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "A few tracing fixes: - Remove lockdown from tracefs itself and moved it to the trace directory. Have the open functions there do the lockdown checks. - Fix a few races with opening an instance file and the instance being deleted (Discovered during the lockdown updates). Kept separate from the clean up code such that they can be backported to stable easier. - Clean up and consolidated the checks done when opening a trace file, as there were multiple checks that need to be done, and it did not make sense having them done in each open instance. - Fix a regression in the record mcount code. - Small hw_lat detector tracer fixes. - A trace_pipe read fix due to not initializing trace_seq" * tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Initialize iter->seq after zeroing in tracing_read_pipe() tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency tracing/hwlat: Report total time spent in all NMIs during the sample recordmcount: Fix nop_mcount() function tracing: Do not create tracefs files if tracefs lockdown is in effect tracing: Add locked_down checks to the open calls of files created for tracefs tracing: Add tracing_check_open_get_tr() tracing: Have trace events system open call tracing_open_generic_tr() tracing: Get trace_array reference for available_tracers files ftrace: Get a reference counter for the trace_array on filter files tracefs: Revert ccbd54ff54e8 ("tracefs: Restrict tracefs when the kernel is locked down")
2019-10-13Merge tag 'hwmon-for-v5.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Update/fix inspur-ipsps1 and k10temp Documentation - Fix nct7904 driver - Fix HWMON_P_MIN_ALARM mask in hwmon core * tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: docs: Extend inspur-ipsps1 title underline hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct. docs: hwmon: Include 'inspur-ipsps1.rst' into docs hwmon: Fix HWMON_P_MIN_ALARM mask hwmon: (k10temp) Update documentation and add temp2_input info hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
2019-10-13Merge tag 'fixes-for-5.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Richard Weinberger: "Two fixes for MTD: - spi-nor: Fix for a regression in write_sr() - rawnand: Regression fix for the au1550nd driver" * tag 'fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: au1550nd: Fix au_read_buf16() prototype mtd: spi-nor: Fix direction of the write_sr() transfer
2019-10-13Merge tag 'for-linus-20191012' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Single small fix for a regression in the sequence logic for linked commands" * tag 'for-linus-20191012' of git://git.kernel.dk/linux-block: io_uring: fix sequence logic for timeout requests
2019-10-12tracing: Initialize iter->seq after zeroing in tracing_read_pipe()Petr Mladek
A customer reported the following softlockup: [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464] [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4 [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] Kernel panic - not syncing: softlockup: hung tasks [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12 [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000 [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00 [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000 [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8 [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000 [899688.160002] tracing_read_pipe+0x336/0x3c0 [899688.160002] __vfs_read+0x26/0x140 [899688.160002] vfs_read+0x87/0x130 [899688.160002] SyS_read+0x42/0x90 [899688.160002] do_syscall_64+0x74/0x160 It caught the process in the middle of trace_access_unlock(). There is no loop. So, it must be looping in the caller tracing_read_pipe() via the "waitagain" label. Crashdump analyze uncovered that iter->seq was completely zeroed at this point, including iter->seq.seq.size. It means that print_trace_line() was never able to print anything and there was no forward progress. The culprit seems to be in the code: /* reset all but tr, trace, and overruns */ memset(&iter->seq, 0, sizeof(struct trace_iterator) - offsetof(struct trace_iterator, seq)); It was added by the commit 53d0aa773053ab182877 ("ftrace: add logic to record overruns"). It was v2.6.27-rc1. It was the time when iter->seq looked like: struct trace_seq { unsigned char buffer[PAGE_SIZE]; unsigned int len; }; There was no "size" variable and zeroing was perfectly fine. The solution is to reinitialize the structure after or without zeroing. Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing/hwlat: Don't ignore outer-loop duration when calculating max_latencySrivatsa S. Bhat (VMware)
max_latency is intended to record the maximum ever observed hardware latency, which may occur in either part of the loop (inner/outer). So we need to also consider the outer-loop sample when updating max_latency. Link: http://lkml.kernel.org/r/157073345463.17189.18124025522664682811.stgit@srivatsa-ubuntu Fixes: e7c15cd8a113 ("tracing: Added hardware latency tracer") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing/hwlat: Report total time spent in all NMIs during the sampleSrivatsa S. Bhat (VMware)
nmi_total_ts is supposed to record the total time spent in *all* NMIs that occur on the given CPU during the (active portion of the) sampling window. However, the code seems to be overwriting this variable for each NMI, thereby only recording the time spent in the most recent NMI. Fix it by accumulating the duration instead. Link: http://lkml.kernel.org/r/157073343544.17189.13911783866738671133.stgit@srivatsa-ubuntu Fixes: 7b2c86250122 ("tracing: Add NMI tracing in hwlat detector") Cc: stable@vger.kernel.org Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12recordmcount: Fix nop_mcount() functionSteven Rostedt (VMware)
The removal of the longjmp code in recordmcount.c mistakenly made the return of make_nop() being negative an exit of nop_mcount(). It should not exit the routine, but instead just not process that part of the code. By exiting with an error code, it would cause the update of recordmcount to fail some files which would fail the build if ftrace function tracing was enabled. Link: http://lkml.kernel.org/r/20191009110538.5909fec6@gandalf.local.home Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Fixes: 3f1df12019f3 ("recordmcount: Rewrite error/success handling") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Do not create tracefs files if tracefs lockdown is in effectSteven Rostedt (VMware)
If on boot up, lockdown is activated for tracefs, don't even bother creating the files. This can also prevent instances from being created if lockdown is in effect. Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add locked_down checks to the open calls of files created for tracefsSteven Rostedt (VMware)
Added various checks on open tracefs calls to see if tracefs is in lockdown mode, and if so, to return -EPERM. Note, the event format files (which are basically standard on all machines) as well as the enabled_functions file (which shows what is currently being traced) are not lockde down. Perhaps they should be, but it seems counter intuitive to lockdown information to help you know if the system has been modified. Link: http://lkml.kernel.org/r/CAHk-=wj7fGPKUspr579Cii-w_y60PtRaiDgKuxVtBAMK0VNNkA@mail.gmail.com Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Add tracing_check_open_get_tr()Steven Rostedt (VMware)
Currently, most files in the tracefs directory test if tracing_disabled is set. If so, it should return -ENODEV. The tracing_disabled is called when tracing is found to be broken. Originally it was done in case the ring buffer was found to be corrupted, and we wanted to prevent reading it from crashing the kernel. But it's also called if a tracing selftest fails on boot. It's a one way switch. That is, once it is triggered, tracing is disabled until reboot. As most tracefs files can also be used by instances in the tracefs directory, they need to be carefully done. Each instance has a trace_array associated to it, and when the instance is removed, the trace_array is freed. But if an instance is opened with a reference to the trace_array, then it requires looking up the trace_array to get its ref counter (as there could be a race with it being deleted and the open itself). Once it is found, a reference is added to prevent the instance from being removed (and the trace_array associated with it freed). Combine the two checks (tracing_disabled and trace_array_get()) into a single helper function. This will also make it easier to add lockdown to tracefs later. Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Have trace events system open call tracing_open_generic_tr()Steven Rostedt (VMware)
Instead of having the trace events system open call open code the taking of the trace_array descriptor (with trace_array_get()) and then calling trace_open_generic(), have it use the tracing_open_generic_tr() that does the combination of the two. This requires making tracing_open_generic_tr() global. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-10-12tracing: Get trace_array reference for available_tracers filesSteven Rostedt (VMware)
As instances may have different tracers available, we need to look at the trace_array descriptor that shows the list of the available tracers for the instance. But there's a race between opening the file and an admin deleting the instance. The trace_array_get() needs to be called before accessing the trace_array. Cc: stable@vger.kernel.org Fixes: 607e2ea167e56 ("tracing: Set up infrastructure to allow tracers for instances") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>