Age | Commit message (Collapse) | Author |
|
ULPs are not supposed to call to ops.* directly.
Suggested-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use aead_request_free() instead of kfree() to properly free memory
allocated by aead_request_alloc(). This ensures sensitive crypto data
is zeroed before being freed.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
ksmbd_rdma_capable_netdev()"
This reverts commit ecce70cf17d91c3dd87a0c4ea00b2d1387729701.
Revert the GUID trick code causing the layering violation.
I will try to allow the users to turn RDMA-capable on/off via sysfs later
Cc: Kangjing Huang <huangkangjing@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Add missing bounds check for create lease context.
Cc: stable@vger.kernel.org
Reported-by: Norbert Szetei <norbert@doyensec.com>
Tested-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Norbert Szetei <norbert@doyensec.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Ext4 bug fixes and cleanups, including:
- hardening against maliciously fuzzed file systems
- backwards compatibility for the brief period when we attempted to
ignore zero-width characters
- avoid potentially BUG'ing if there is a file system corruption
found during the file system unmount
- fix free space reporting by statfs when project quotas are enabled
and the free space is less than the remaining project quota
Also improve performance when replaying a journal with a very large
number of revoke records (applicable for Lustre volumes)"
* tag 'ext4-for_linus-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (71 commits)
ext4: fix OOB read when checking dotdot dir
ext4: on a remount, only log the ro or r/w state when it has changed
ext4: correct the error handle in ext4_fallocate()
ext4: Make sb update interval tunable
ext4: avoid journaling sb update on error if journal is destroying
ext4: define ext4_journal_destroy wrapper
ext4: hash: simplify kzalloc(n * 1, ...) to kzalloc(n, ...)
jbd2: add a missing data flush during file and fs synchronization
ext4: don't over-report free space or inodes in statvfs
ext4: clear DISCARD flag if device does not support discard
jbd2: remove jbd2_journal_unfile_buffer()
ext4: reorder capability check last
ext4: update the comment about mb_optimize_scan
jbd2: fix off-by-one while erasing journal
ext4: remove references to bh->b_page
ext4: goto right label 'out_mmap_sem' in ext4_setattr()
ext4: fix out-of-bound read in ext4_xattr_inode_dec_ref_all()
ext4: introduce ITAIL helper
jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature
ext4: remove redundant function ext4_has_metadata_csum
...
|
|
Pull bcachefs updates from Kent Overstreet:
"On disk format is now soft frozen: no more required/automatic are
anticipated before taking off the experimental label.
Major changes/features since 6.14:
- Scrub
- Blocksize greater than page size support
- A number of "rebalance spinning and doing no work" issues have been
fixed; we now check if the write allocation will succeed in
bch2_data_update_init(), before kicking off the read.
There's still more work to do in this area. Later we may want to
add another bitset btree, like rebalance_work, to track "extents
that rebalance was requested to move but couldn't", e.g. due to
destination target having insufficient online devices.
- We can now support scaling well into the petabyte range: latest
bcachefs-tools will pick an appropriate bucket size at format time
to ensure fsck can run in available memory (e.g. a server with
256GB of ram and 100PB of storage would want 16MB buckets).
On disk format changes:
- 1.21: cached backpointers (scalability improvement)
Cached replicas now get backpointers, which means we no longer rely
on incrementing bucket generation numbers to invalidate cached
data: this lets us get rid of the bucket generation number garbage
collection, which had to periodically rescan all extents to
recompute bucket oldest_gen.
Bucket generation numbers are now only used as a consistency check,
but they're quite useful for that.
- 1.22: stripe backpointers
Stripes now have backpointers: erasure coded stripes have their own
checksums, separate from the checksums for the extents they contain
(and stripe checksums also cover the parity blocks). This is
required for implementing scrub for stripes.
- 1.23: stripe lru (scalability improvement)
Persistent lru for stripes, ordered by "number of empty blocks".
This is used by the stripe creation path, which depending on free
space may create a new stripe out of a partially empty existing
stripe instead of starting a brand new stripe.
This replaces an in-memory heap, and means we no longer have to
read in the stripes btree at startup.
- 1.24: casefolding
Case insensitive directory support, courtesy of Valve.
This is an incompatible feature, to enable mount with
-o version_upgrade=incompatible
- 1.25: extent_flags
Another incompatible feature requiring explicit opt-in to enable.
This adds a flags entry to extents, and a flag bit that marks
extents as poisoned.
A poisoned extent is an extent that was unreadable due to checksum
errors. We can't move such extents without giving them a new
checksum, and we may have to move them (for e.g. copygc or device
evacuate). We also don't want to delete them: in the future we'll
have an API that lets userspace ignore checksum errors and attempt
to deal with simple bitrot itself. Marking them as poisoned lets us
continue to return the correct error to userspace on normal read
calls.
Other changes/features:
- BCH_IOCTL_QUERY_COUNTERS: this is used by the new 'bcachefs fs top'
command, which shows a live view of all internal filesystem
counters.
- Improved journal pipelining: we can now have 16 journal writes in
flight concurrently, up from 4. We're logging significantly more to
the journal than we used to with all the recent disk accounting
changes and additions, so some users should see a performance
increase on some workloads.
- BCH_MEMBER_STATE_failed: previously, we would do no IO at all to
devices marked as failed. Now we will attempt to read from them,
but only if we have no better options.
- New option, write_error_timeout: devices will be kicked out of the
filesystem if all writes have been failing for x number of seconds.
We now also kick devices out when notified by blk_holder_ops that
they've gone offline.
- Device option handling improvements: the discard option should now
be working as expected (additionally, in -tools, all device options
that can be set at format time can now be set at device add time,
i.e. data_allowed, state).
- We now try harder to read data after a checksum error: we'll do
additional retries if necessary to a device after after it gave us
data with a checksum error.
- More self healing work: the full inode <-> dirent consistency
checks that are currently run by fsck are now also run every time
we do a lookup, meaning we'll be able to correct errors at runtime.
Runtime self healing will be flipped on after the new changes have
seen more testing, currently they're just checking for consistency.
- KMSAN fixes: our KMSAN builds should be nearly clean now, which
will put a massive dent in the syzbot dashboard"
* tag 'bcachefs-2025-03-24' of git://evilpiepirate.org/bcachefs: (180 commits)
bcachefs: Kill unnecessary bch2_dev_usage_read()
bcachefs: btree node write errors now print btree node
bcachefs: Fix race in print_chain()
bcachefs: btree_trans_restart_foreign_task()
bcachefs: bch2_disk_accounting_mod2()
bcachefs: zero init journal bios
bcachefs: Eliminate padding in move_bucket_key
bcachefs: Fix a KMSAN splat in btree_update_nodes_written()
bcachefs: kmsan asserts
bcachefs: Fix kmsan warnings in bch2_extent_crc_pack()
bcachefs: Disable asm memcpys when kmsan enabled
bcachefs: Handle backpointers with unknown data types
bcachefs: Count BCH_DATA_parity backpointers correctly
bcachefs: Run bch2_check_dirent_target() at lookup time
bcachefs: Refactor bch2_check_dirent_target()
bcachefs: Move bch2_check_dirent_target() to namei.c
bcachefs: fs-common.c -> namei.c
bcachefs: EIO cleanup
bcachefs: bch2_write_prep_encoded_data() now returns errcode
bcachefs: Simplify bch2_write_op_error()
...
|
|
Pull jfs updates from David Kleikamp:
"Various bug fixes and cleanups for JFS"
* tag 'jfs-6.14' of github.com:kleikamp/linux-shaggy:
jfs: add index corruption check to DT_GETPAGE()
fs/jfs: consolidate sanity checking in dbMount
jfs: add sanity check for agwidth in dbMount
jfs: Prevent copying of nlink with value 0 from disk inode
fs/jfs: Prevent integer overflow in AG size calculation
fs/jfs: cast inactags to s64 to prevent potential overflow
jfs: Fix uninit-value access of imap allocated in the diMount() function
jfs: fix slab-out-of-bounds read in ea_get()
jfs: add check read-only before truncation in jfs_truncate_nolock()
jfs: add check read-only before txBeginAnon() call
jfs: reject on-disk inodes of an unsupported type
jfs: Remove reference to bh->b_page
jfs: Delete a couple tabs in jfs_reconfigure()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs update from Mike Marshall:
- remove two orangefs bufmap routines that no longer have callers
* tag 'for-linus-6.15-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Bufmap deadcoding
|
|
Pull xfs updates from Carlos Maiolino:
- XFS zoned allocator: Enables XFS to support zoned devices using its
real-time allocator
- Use folios/vmalloc for buffer cache backing memory
- Some code cleanups and bug fixes
* tag 'xfs-6.15-merge' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (70 commits)
xfs: remove the flags argument to xfs_buf_get_uncached
xfs: remove the flags argument to xfs_buf_read_uncached
xfs: remove xfs_buf_free_maps
xfs: remove xfs_buf_get_maps
xfs: call xfs_buf_alloc_backing_mem from _xfs_buf_alloc
xfs: remove unnecessary NULL check before kvfree()
xfs: don't wake zone space waiters without m_zone_info
xfs: don't increment m_generation for all errors in xfs_growfs_data
xfs: fix a missing unlock in xfs_growfs_data
xfs: Remove duplicate xfs_rtbitmap.h header
xfs: trigger zone GC when out of available rt blocks
xfs: trace what memory backs a buffer
xfs: cleanup mapping tmpfs folios into the buffer cache
xfs: use vmalloc instead of vm_map_area for buffer backing memory
xfs: buffer items don't straddle pages anymore
xfs: kill XBF_UNMAPPED
xfs: convert buffer cache to use high order folios
xfs: remove the kmalloc to page allocator fallback
xfs: refactor backing memory allocations for buffers
xfs: remove xfs_buf_is_vmapped
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
- two fixes to the recent rcu lookup optimizations
- a change allowing TCP to be configured with the first of multiple IP
address
* tag 'dlm-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: make tcp still work in multi-link env
dlm: fix error if active rsb is not hashed
dlm: fix error if inactive rsb is not hashed
dlm: prevent NPD when writing a positive value to event_done
dlm: increase max number of links for corosync3/knet
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, there are three major updates: (1) folio conversion,
(2) refactoring for mount API conversion, (3) some performance
improvement such as direct IO, checkpoint speed, and IO priority
hints.
For stability, there are patches which add more sanity checks and
fixes some major issues like i_size in atomic write operations and
write pointer recovery in zoned devices.
Enhancements:
- huge folio converion work by Matthew Wilcox
- clean up for mount API conversion by Eric Sandeen
- improve direct IO speed in the overwrite case
- add some sanity check on node consistency
- set highest IO priority for checkpoint thread
- keep POSIX_FADV_NOREUSE ranges and add sysfs entry to reclaim pages
- add ioctl to get IO priority hint
- add carve_out sysfs node for fsstat
Bug fixes:
- disable nat_bits during umount to avoid potential nat entry corruption
- fix missing i_size update on atomic writes
- fix missing discard for active segments
- fix running out of free segments
- fix out-of-bounds access in f2fs_truncate_inode_blocks()
- call f2fs_recover_quota_end() correctly
- fix potential deadloop in prepare_compress_overwrite()
- fix the missing write pointer correction for zoned device
- fix to avoid panic once fallocation fails for pinfile
- don't retry IO for corrupted data scenario
There are many other clean up patches and minor bug fixes as usual"
* tag 'f2fs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: fix missing discard for active segments
f2fs: optimize f2fs DIO overwrites
f2fs: fix to avoid atomicity corruption of atomic file
f2fs: pass sbi rather than sb to parse_options()
f2fs: pass sbi rather than sb to quota qf_name helpers
f2fs: defer readonly check vs norecovery
f2fs: Pass sbi rather than sb to f2fs_set_test_dummy_encryption
f2fs: make LAZYTIME a mount option flag
f2fs: make INLINECRYPT a mount option flag
f2fs: factor out an f2fs_default_check function
f2fs: consolidate unsupported option handling errors
f2fs: use f2fs_sb_has_device_alias during option parsing
f2fs: add carve_out sysfs node
f2fs: fix to avoid running out of free segments
f2fs: Remove f2fs_write_node_page()
f2fs: Remove f2fs_write_meta_page()
f2fs: Remove f2fs_write_data_page()
f2fs: Remove check for ->writepage
Revert "f2fs: rebuild nat_bits during umount"
f2fs: fix to avoid accessing uninitialized curseg
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"User visible changes:
- fall back to buffered write if direct io is done on a file that
requires checksums
- this avoids a problem with checksum mismatch errors, observed
e.g. on virtual images when writes to pages under writeback
cause the checksum mismatch reports
- this may lead to some performance degradation but currently the
recommended setup for VM images is to use the NOCOW file
attribute that also disables checksums
- fast/realtime zstd levels -15 to -1
- supported by mount options (compress=zstd:-5) and defrag ioctl
- improved speed, reduced compression ratio, check the commit for
sample measurements
- defrag ioctl extended to accept negative compression levels
- subpage mode
- remove warning when subpage mode is used, the feature is now
reasonably complete and tested
- in debug mode allow to create 2K b-tree nodes to allow testing
subpage on x86_64 with 4K pages too
Performance improvements:
- in send, better file path caching improves runtime (on sample load
by -30%)
- on s390x with hardware zlib support prepare the input buffer in a
better way to get the best results from the acceleration
- minor speed improvement in encoded read, avoid memory allocation in
synchronous mode
Core:
- enable stable writes on inodes, replacing manually waiting for
writeback and allowing to skip that on inodes without checksums
- add last checks and warnings for out-of-band dirty writes to pages,
requiring a fixup ("fixup worker"), this should not be necessary
since 5.8 where get_user_page() and pin_user_pages*() prevent this
- long history behind that, we'll be happy to remove the whole
infrastructure in the near future
- more folio API conversions and preparations for large folio support
- subpage cleanups and refactoring, split handling of data and
metadata to allow future support for large folios
- readpage works as block-by-block, no change for normal mode, this
is preparation for future subpage updates
- block group refcount fixes and hardening
- delayed iput fixes
- in zoned mode, fix zone activation on filesystem with missing
devices
Cleanups:
- inode parameter cleanups
- path auto-freeing updates
- code flow simplifications in send
- redundant parameter cleanups"
* tag 'for-6.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (164 commits)
btrfs: zoned: fix zone finishing with missing devices
btrfs: zoned: fix zone activation with missing devices
btrfs: remove end_no_trans label from btrfs_log_inode_parent()
btrfs: simplify condition for logging new dentries at btrfs_log_inode_parent()
btrfs: remove redundant else statement from btrfs_log_inode_parent()
btrfs: use memcmp_extent_buffer() at replay_one_extent()
btrfs: update outdated comment for overwrite_item()
btrfs: use variables to store extent buffer and slot at overwrite_item()
btrfs: avoid unnecessary memory allocation and copy at overwrite_item()
btrfs: don't clobber ret in btrfs_validate_super()
btrfs: prepare btrfs_page_mkwrite() for large folios
btrfs: prepare extent_io.c for future large folio support
btrfs: prepare btrfs_launcher_folio() for large folios support
btrfs: replace PAGE_SIZE with folio_size for subpage.[ch]
btrfs: add a size parameter to btrfs_alloc_subpage()
btrfs: subpage: make btrfs_is_subpage() check against a folio
btrfs: add extra warning if delayed iput is added when it's not allowed
btrfs: avoid redundant path slot assignment in btrfs_search_forward()
btrfs: remove unnecessary btrfs_key local variable in btrfs_search_forward()
btrfs: simplify the return value handling in search_ioctl()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, EROFS 48-bit block addressing is available to support
massive datasets for model training and other large data archive use
cases.
In addition, byte-oriented encoded extents have been supported to
reduce metadata sizes when using large configurations as well as to
improve Zstd compression speed.
There are some bugfixes and cleanups as usual.
Summary:
- Support 48-bit block addressing for large images
- Introduce encoded extents to reduce metadata on larger pclusters
- Enable unaligned compressed data to improve Zstd compression speed
- Allow 16-byte volume names again
- Minor cleanups"
* tag 'erofs-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: enable 48-bit layout support
erofs: support unaligned encoded data
erofs: implement encoded extent metadata
erofs: add encoded extent on-disk definition
erofs: initialize decompression early
erofs: support dot-omitted directories
erofs: implement 48-bit block addressing for unencoded inodes
erofs: add 48-bit block addressing on-disk support
erofs: simplify erofs_{read,fill}_inode()
erofs: get rid of erofs_map_blocks_flatmode()
erofs: move {in,out}pages into struct z_erofs_decompress_req
erofs: clean up header parsing for ztailpacking and fragments
erofs: simplify tail inline pcluster handling
erofs: allow 16-byte volume name again
erofs: get rid of erofs_kmap_type
erofs: use Z_EROFS_LCLUSTER_TYPE_MAX to simplify switches
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix two bugs related to locking request cancelation (locking request
being retried instead of canceled; canceling the wrong locking
request)
- Prevent a race between inode creation and deferred delete analogous
to commit ffd1cf0443a2 from 6.13. This now allows to further simplify
gfs2_evict_inode() without introducing mysterious problems
- When in inode delete should be verified / retried "later" but that
isn't possible, skip the delete instead of carrying it out
immediately. This broke in 6.13
- More folio conversions from Matthew Wilcox (plus a fix from Dan
Carpenter)
- Various minor fixes and cleanups
* tag 'gfs2-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (22 commits)
gfs2: some comment clarifications
gfs2: Fix a NULL vs IS_ERR() bug in gfs2_find_jhead()
gfs2: Convert gfs2_meta_read_endio() to use a folio
gfs2: Convert gfs2_end_log_write_bh() to work on a folio
gfs2: Convert gfs2_find_jhead() to use a folio
gfs2: Convert gfs2_jhead_pg_srch() to gfs2_jhead_folio_search()
gfs2: Use b_folio in gfs2_check_magic()
gfs2: Use b_folio in gfs2_submit_bhs()
gfs2: Use b_folio in gfs2_trans_add_meta()
gfs2: Use b_folio in gfs2_log_write_bh()
gfs2: skip if we cannot defer delete
gfs2: remove redundant warnings
gfs2: minor evict fix
gfs2: Prevent inode creation race (2)
gfs2: Fix additional unlikely request cancelation race
gfs2: Fix request cancelation bug
gfs2: Check for empty queue in run_queue
gfs2: Remove more dead code in add_to_queue
gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETE
gfs2: glock holder GL_NOPID fix
...
|
|
xfstests generic/730 test failed because after deleting the device
that still had dirty data, the file could still be read without
returning an error. The reason is the missing shutdown check in
->read_iter.
I also noticed that shutdown checks were missing from ->write_iter,
->splice_read, and ->mmap. This commit adds shutdown checks to all
of them.
Fixes: f761fcdd289d ("exfat: Implement sops->shutdown and ioctl")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
In exfat_find_last_cluster(), the cluster chain is traversed until
the EOF cluster. If the cluster chain includes a loop due to file
system corruption, the EOF cluster cannot be traversed, resulting
in an infinite loop.
If the number of clusters indicated by the file size is inconsistent
with the cluster chain length, exfat_find_last_cluster() will return
an error, so if this inconsistency is found, the traversal can be
aborted without traversing to the EOF cluster.
Reported-by: syzbot+f7d147e6db52b1e09dba@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f7d147e6db52b1e09dba
Tested-by: syzbot+f7d147e6db52b1e09dba@syzkaller.appspotmail.com
Fixes: 31023864e67a ("exfat: add fat entry operations")
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
When get_block is called with a buffer_head allocated on the stack, such
as do_mpage_readpage, stack corruption due to buffer_head UAF may occur in
the following race condition situation.
<CPU 0> <CPU 1>
mpage_read_folio
<<bh on stack>>
do_mpage_readpage
exfat_get_block
bh_read
__bh_read
get_bh(bh)
submit_bh
wait_on_buffer
...
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
<<keep going>>
...
...
...
...
<<bh is not valid out of mpage_read_folio>>
.
.
another_function
<<variable A on stack>>
put_bh(bh)
atomic_dec(bh->b_count)
* stack corruption here *
This patch returns -EAGAIN if a folio does not have buffers when bh_read
needs to be called. By doing this, the caller can fallback to functions
like block_read_full_folio(), create a buffer_head in the folio, and then
call get_block again.
Let's do not call bh_read() with on-stack buffer_head.
Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength")
Cc: stable@vger.kernel.org
Tested-by: Yeongjin Gil <youngjin.gil@samsung.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
The callback function statfs() is called only after the file
system is mounted. During the process of mounting the exFAT
file system, the number of used clusters has been counted, so
the condition "sbi->used_clusters == EXFAT_CLUSTERS_UNTRACKED"
is always false and should be deleted.
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
If the discard mount option is enabled, the file's clusters are
discarded when the clusters are freed. Discarding clusters one by
one will significantly reduce performance. Poor performance may
cause soft lockup when lots of clusters are freed.
This commit improves performance by discarding contiguous clusters
in batches.
Measure the performance by:
# truncate -s 80G /mnt/file
# time rm /mnt/file
Without this commit:
real 4m46.183s
user 0m0.000s
sys 0m12.863s
With this commit:
real 0m1.661s
user 0m0.000s
sys 0m0.017s
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move vm_table members out of kernel/sysctl.c
All vm_table array members have moved to their respective subsystems
leading to the removal of vm_table from kernel/sysctl.c. This
increases modularity by placing the ctl_tables closer to where they
are actually used and at the same time reducing the chances of merge
conflicts in kernel/sysctl.c.
- ctl_table range fixes
Replace the proc_handler function that checks variable ranges in
coredump_sysctls and vdso_table with the one that actually uses the
extra{1,2} pointers as min/max values. This tightens the range of the
values that users can pass into the kernel effectively preventing
{under,over}flows.
- Misc fixes
Correct grammar errors and typos in test messages. Update sysctl
files in MAINTAINERS. Constified and removed array size in
declaration for alignment_tbl
* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
selftests/sysctl: fix wording of help messages
selftests: fix spelling/grammar errors in sysctl/sysctl.sh
MAINTAINERS: Update sysctl file list in MAINTAINERS
sysctl: Fix underflow value setting risk in vm_table
coredump: Fixes core_pipe_limit sysctl proc_handler
sysctl: remove unneeded include
sysctl: remove the vm_table
sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
fs: dcache: move the sysctl to fs/dcache.c
sunrpc: simplify rpcauth_cache_shrink_count()
fs: drop_caches: move sysctl to fs/drop_caches.c
fs: fs-writeback: move sysctl to fs/fs-writeback.c
mm: nommu: move sysctl to mm/nommu.c
security: min_addr: move sysctl to security/min_addr.c
mm: mmap: move sysctl to mm/mmap.c
mm: util: move sysctls to mm/util.c
mm: vmscan: move vmscan sysctls to mm/vmscan.c
mm: swap: move sysctl to mm/swap.c
mm: filemap: move sysctl to mm/filemap.c
...
|
|
Pull block updates from Jens Axboe:
- Fixes for integrity handling
- NVMe pull request via Keith:
- Secure concatenation for TCP transport (Hannes)
- Multipath sysfs visibility (Nilay)
- Various cleanups (Qasim, Baruch, Wang, Chen, Mike, Damien, Li)
- Correct use of 64-bit BARs for pci-epf target (Niklas)
- Socket fix for selinux when used in containers (Peijie)
- MD pull request via Yu:
- fix recovery can preempt resync (Li Nan)
- fix md-bitmap IO limit (Su Yue)
- fix raid10 discard with REQ_NOWAIT (Xiao Ni)
- fix raid1 memory leak (Zheng Qixing)
- fix mddev uaf (Yu Kuai)
- fix raid1,raid10 IO flags (Yu Kuai)
- some refactor and cleanup (Yu Kuai)
- Series cleaning up and fixing bugs in the bad block handling code
- Improve support for write failure simulation in null_blk
- Various lock ordering fixes
- Fixes for locking for debugfs attributes
- Various ublk related fixes and improvements
- Cleanups for blk-rq-qos wait handling
- blk-throttle fixes
- Fixes for loop dio and sync handling
- Fixes and cleanups for the auto-PI code
- Block side support for hardware encryption keys in blk-crypto
- Various cleanups and fixes
* tag 'for-6.15/block-20250322' of git://git.kernel.dk/linux: (105 commits)
nvmet: replace max(a, min(b, c)) by clamp(val, lo, hi)
nvme-tcp: fix selinux denied when calling sock_sendmsg
nvmet: pci-epf: Always configure BAR0 as 64-bit
nvmet: Remove duplicate uuid_copy
nvme: zns: Simplify nvme_zone_parse_entry()
nvmet: pci-epf: Remove redundant 'flush_workqueue()' calls
nvmet-fc: Remove unused functions
nvme-pci: remove stale comment
nvme-fc: Utilise min3() to simplify queue count calculation
nvme-multipath: Add visibility for queue-depth io-policy
nvme-multipath: Add visibility for numa io-policy
nvme-multipath: Add visibility for round-robin io-policy
nvmet: add tls_concat and tls_key debugfs entries
nvmet-tcp: support secure channel concatenation
nvmet: Add 'sq' argument to alloc_ctrl_args
nvme-fabrics: reset admin connection for secure concatenation
nvme-tcp: request secure channel concatenation
nvme-keyring: add nvme_tls_psk_refresh()
nvme: add nvme_auth_derive_tls_psk()
nvme: add nvme_auth_generate_digest()
...
|
|
Fixes "task out to lunch" warnings during recovery on large machines
with lots of dirty data in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree node scan has to wait on kthread workers that scan each device,
potentially for awhile.
We would like this to be interruptible, but we may need a different
mechanism than signals for that - we've had bugs in the past where
mounts were failing due to checking for signals, and no explanation on
where they came from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Data move -> move_get_io_opts -> bch2_get_update_rebalance_opts
requires a not_extents iterator; this fixes the path where we're walking
the extents btree and chase a reflink pointer into the reflink btree.
bch2_lookup_indirect_extent() requires working with an extents iterator
(due to peek_slot() semantics), so we implement
bch2_lookup_indirect_extent_for_move().
This is simplified because there's no need to report
indirect_extent_missing_errors here, that can be deferred until fsck or
when a user reads that data.
Reported-by: Maël Kerbiriou <mael.kerbiriou@free.fr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
There's various checks for "are we going to compress this" - but we're
not going to compress if we know it's incompressible.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We weren't checking that accounting keys have the expected number of
accounters. Originally we probably wanted to be flexible on this, but it
doesn't look like that will be required - accounting is extended by
adding new counter types, not more counters to an existing type.
This means we can drop a BUG_ON() that popped once in automated testing,
and the new validation will make that bug easier to track down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
SMB1 protocol supports non-UNICODE (8-bit OEM character set) and
UNICODE (UTF-16) modes.
Linux SMB1 client implements both of them but currently does not allow to
choose non-UNICODE mode when SMB1 server announce UNICODE mode support.
This change adds a new mount option -o nounicode to disable UNICODE mode
and force usage of non-UNICODE (8-bit OEM character set) mode.
This allows to test non-UNICODE implementation of Linux SMB1 client against
any SMB1 server, including modern and recent Windows SMB1 server.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Windows SMB servers (including SMB2+) which are working over RFC1001
require that Netbios server name specified in RFC1001 Session Request
packet is same as the UNC host name. Netbios server name can be already
specified manually via -o servern= option.
With this change the RFC1001 server name is set automatically by extracting
the hostname from the mount source.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") attempted to fix a netns use-after-free issue by manually
adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add().
However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
after rmmod") pointed out that the approach of manually setting
sk->sk_net_refcnt in the first commit was technically incorrect, as
sk->sk_net_refcnt should only be set for user sockets. It led to issues
like TCP timers not being cleared properly on close. The second commit
moved to a model of just holding an extra netns reference for
server->ssocket using get_net(), and dropping it when the server is torn
down.
But there remain some gaps in the get_net()/put_net() balancing added by
these commits. The incomplete reference handling in these fixes results
in two issues:
1. Netns refcount leaks[1]
The problem process is as follows:
```
mount.cifs cifsd
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net. */
ip_connect
generic_ip_connect /* Try port 445 */
get_net()
->connect() /* Failed */
put_net()
generic_ip_connect /* Try port 139 */
get_net() /* Missing matching put_net() for this get_net().*/
cifs_get_smb_ses
cifs_negotiate_protocol
smb2_negotiate
SMB2_negotiate
cifs_send_recv
wait_for_response
cifs_demultiplex_thread
cifs_read_from_socket
cifs_readv_from_socket
cifs_reconnect
cifs_abort_connection
sock_release();
server->ssocket = NULL;
/* Missing put_net() here. */
generic_ip_connect
get_net()
->connect() /* Failed */
put_net()
sock_release();
server->ssocket = NULL;
free_rsp_buf
...
clean_demultiplex_info
/* It's only called once here. */
put_net()
```
When cifs_reconnect() is triggered, the server->ssocket is released
without a corresponding put_net() for the reference acquired in
generic_ip_connect() before. it ends up calling generic_ip_connect()
again to retry get_net(). After that, server->ssocket is set to NULL
in the error path of generic_ip_connect(), and the net count cannot be
released in the final clean_demultiplex_info() function.
2. Potential use-after-free
The current refcounting scheme can lead to a potential use-after-free issue
in the following scenario:
```
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net */
ip_connect
generic_ip_connect
get_net()
bind_socket
kernel_bind /* failed */
put_net()
/* after out_err_crypto_release label */
put_net()
/* after out_err label */
put_net()
```
In the exception handling process where binding the socket fails, the
get_net() and put_net() calls are unbalanced, which may cause the
server->net reference count to drop to zero and be prematurely released.
To address both issues, this patch ties the netns reference counting to
the server->ssocket and server lifecycles. The extra reference is now
acquired when the server or socket is created, and released when the
socket is destroyed or the server is torn down.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=219792
Fixes: ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.")
Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod")
Signed-off-by: Wang Zhaolong <wangzhaolong1@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
cifs.ko is missing validation check when accessing smb_aces.
This patch add validation check for the fields in smb_aces.
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
secondary channels.
In a multichannel setup, it was observed that a few fields were not being
copied over to the secondary channels, which impacted performance in cases
where these options were relevant but not properly synchronized. To address
this, this patch introduces copying the following parameters from the
primary channel to the secondary channels:
- min_offload
- compression.requested
- dfs_conn
- ignore_signature
- leaf_fullpath
- noblockcnt
- retrans
- sign
By copying these parameters, we ensure consistency across channels and
prevent performance degradation due to missing or outdated settings.
Cc: stable@vger.kernel.org
Signed-off-by: Aman <aman1@microsoft.com>
Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Function ip_rfc1001_connect() send NetBIOS session request but currently
does not read response. It even does not wait for the response. Instead it
just calls usleep_range(1000, 2000) and explain in comment that some
servers require short break before sending SMB negotiate packet. Response
is later handled in generic is_smb_response() function called from
cifs_demultiplex_thread().
That comment probably refers to the old DOS SMB server which cannot process
incoming SMB negotiate packet if it has not sent NetBIOS session response
packet. Note that current sleep timeout is too small when trying to
establish connection to DOS SMB server running in qemu virtual machine
connected over qemu user networking with guestfwd netcat options. So that
usleep_range() call is not useful at all.
NetBIOS session response packet contains useful error information, like
the server name specified NetBIOS session request packet is incorrect.
Old Windows SMB servers and even the latest SMB server on the latest
Windows Server 2022 version requires that the name is the correct server
name, otherwise they return error RFC1002_NOT_PRESENT. This applies for all
SMB dialects (old SMB1, and also modern SMB2 and SMB3).
Therefore read the reply of NetBIOS session request and implement parsing
of the reply. Log received error to dmesg to help debugging reason why
connection was refused. Also convert NetBIOS error to useful errno.
Note that ip_rfc1001_connect() function is used only when doing connection
over port 139. So the common SMB scenario over port 445 is not affected by
this change at all.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Function ip_rfc1001_connect() which establish NetBIOS session for SMB
connections, currently uses smb_send() function for sending NetBIOS Session
Request packet. This function expects that the passed buffer is SMB packet
and for SMB2+ connections it mangles packet header, which breaks prepared
NetBIOS Session Request packet. Result is that this function send garbage
packet for SMB2+ connection, which SMB2+ server cannot parse. That function
is not mangling packets for SMB1 connections, so it somehow works for SMB1.
Fix this problem and instead of smb_send(), use smb_send_kvec() function
which does not mangle prepared packet, this function send them as is. Just
API of this function takes struct msghdr (kvec) instead of packet buffer.
[MS-SMB2] specification allows SMB2 protocol to use NetBIOS as a transport
protocol. NetBIOS can be used over TCP via port 139. So this is a valid
configuration, just not so common. And even recent Windows versions (e.g.
Windows Server 2022) still supports this configuration: SMB over TCP port
139, including for modern SMB2 and SMB3 dialects.
This change fixes SMB2 and SMB3 connections over TCP port 139 which
requires establishing of NetBIOS session. Tested that this change fixes
establishing of SMB2 and SMB3 connections with Windows Server 2022.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Currently ->get_acl() callback always create request for OWNER, GROUP and
DACL, even when only DACLs was requested by user. Change API callback to
request only information for which the caller asked. Therefore when only
DACLs requested, then SMB client will prepare and send DACL-only request.
This change fixes retrieving of "system.cifs_acl" and "system.smb3_acl"
xattrs to contain only DACL structure as documented.
Note that setting/changing of "system.cifs_acl" and "system.smb3_acl"
xattrs already takes only DACL structure and ignores all other fields.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Do not attempt to query or create reparse point when server fs does not
support it. This will prevent creating unusable empty object on the server.
Signed-off-by: Pali Rohár <pali@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
If a containerised process is killed and causes an ENETUNREACH or
ENETDOWN error to be propagated to the state manager, then mark the
nfs_client as being dead so that we don't loop in functions that are
expecting recovery to succeed.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If someone calls nfs_mark_client_ready(clp, status) with a negative
value for status, then that should signal that the nfs_client is no
longer valid.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Replace the tests for the RPC client being shut down with tests for
whether the nfs_client is in an error state.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The nfs_client manages state for all the superblocks in the
"cl_superblocks" list, so it must not be shut down until all of them are
gone.
Fixes: 7d3e26a054c8 ("NFS: Cancel all existing RPC tasks when shutdown")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Pull fscrypt updates from Eric Biggers:
"A fix for an issue where CONFIG_FS_ENCRYPTION could be enabled without
some of its dependencies, and a small documentation update"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
fscrypt: mention init_on_free instead of page poisoning
fscrypt: drop obsolete recommendation to enable optimized ChaCha20
Revert "fscrypt: relax Kconfig dependencies for crypto API algorithms"
|
|
Pull fsverity updates from Eric Biggers:
"A fix for an issue where CONFIG_FS_VERITY could be enabled without
some of its dependencies, and a small documentation update"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
Revert "fsverity: relax build time dependency on CRYPTO_SHA256"
Documentation: add a usecase for FS_IOC_READ_VERITY_METADATA
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer core updates from Thomas Gleixner:
- Fix a memory ordering issue in posix-timers
Posix-timer lookup is lockless and reevaluates the timer validity
under the timer lock, but the update which validates the timer is not
protected by the timer lock. That allows the store to be reordered
against the initialization stores, so that the lookup side can
observe a partially initialized timer. That's mostly a theoretical
problem, but incorrect nevertheless.
- Fix a long standing inconsistency of the coarse time getters
The coarse time getters read the base time of the current update
cycle without reading the actual hardware clock. NTP frequency
adjustment can set the base time backwards. The fine grained
interfaces compensate this by reading the clock and applying the new
conversion factor, but the coarse grained time getters use the base
time directly. That allows the user to observe time going backwards.
Cure it by always forwarding base time, when NTP changes the
frequency with an immediate step.
- Rework of posix-timer hashing
The posix-timer hash is not scalable and due to the CRIU timer
restore mechanism prone to massive contention on the global hash
bucket lock.
Replace the global hash lock with a fine grained per bucket locking
scheme to address that.
- Rework the proc/$PID/timers interface.
/proc/$PID/timers is provided for CRIU to be able to restore a timer.
The printout happens with sighand lock held and interrupts disabled.
That's not required as this can be done with RCU protection as well.
- Provide a sane mechanism for CRIU to restore a timer ID
CRIU restores timers by creating and deleting them until the kernel
internal per process ID counter reached the requested ID. That's
horribly slow for sparse timer IDs.
Provide a prctl() which allows CRIU to restore a timer with a given
ID. When enabled the ID pointer is used as input pointer to read the
requested ID from user space. When disabled, the normal allocation
scheme (next ID) is active as before. This is backwards compatible
for both kernel and user space.
- Make hrtimer_update_function() less expensive.
The sanity checks are valuable, but expensive for high frequency
usage in io/uring. Make the debug checks conditional and enable them
only when lockdep is enabled.
- Small updates, cleanups and improvements
* tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
selftests/timers: Improve skew_consistency by testing with other clockids
timekeeping: Fix possible inconsistencies in _COARSE clockids
posix-timers: Drop redundant memset() invocation
selftests/timers/posix-timers: Add a test for exact allocation mode
posix-timers: Provide a mechanism to allocate a given timer ID
posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held
posix-timers: Make per process list RCU safe
posix-timers: Avoid false cacheline sharing
posix-timers: Switch to jhash32()
posix-timers: Improve hash table performance
posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t
posix-timers: Make lock_timer() use guard()
posix-timers: Rework timer removal
posix-timers: Simplify lock/unlock_timer()
posix-timers: Use guards in a few places
posix-timers: Remove SLAB_PANIC from kmem cache
posix-timers: Remove a few paranoid warnings
posix-timers: Cleanup includes
posix-timers: Add cond_resched() to posix_timer_add() search loop
posix-timers: Initialise timer before adding it to the hash table
...
|
|
They were being truncated, printk has a 1k limit per call
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Also, improve the message in prep_encoded_data() - it now prints
good/bad checksums, and checksum type.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
__bch2_read, before calling __bch2_read_extent(), sets bvec_iter.bi_size
to "the size we can read from the current extent" with a swap, and
restores it to "the size for the total read" after the read_extent call
with another swap.
But we neglected to do the restore before the "if (ret) goto err;" -
which is a problem if we're retrying those errors.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we're moving an extent that was partially overwritten,
bch2_write_rechecksum() will trim it to the currenty live range.
If we then also want to compress it, it'll be decrypted - but the nonce
has been advanced for the overwritten start of the extent that we
dropped, and we were using the nonce we calculated before rechecksum().
Reported-by: Gabriel de Perthuis <g2p.code@gmail.com>
Fixes: 127d90d2823e ("bcachefs: bch2_write_prep_encoded_data() now returns errcode")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Every loadable module should have a description, to avoid a warning such as:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/exportfs/exportfs.o
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250324173242.1501003-1-arnd@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|