Age | Commit message (Collapse) | Author |
|
Repair code will do updates on older snapshot versions, so needs the
correct annotation.
Reported-by: syzbot+42581416dba62b364750@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
If we're doing a reflink copy of existing reflinked data, we may only
set REFLINK_P_MAY_UPDATE_OPTIONS if it was set on the reflink pointer
we're copying from.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Large folios aren't supported without TRANSPARENT_HUGEPAGE
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We can't unlock a should_be_locked path unless we're in a transaction
restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Different versions differ on the size of the blacklist range; it is
theoretically possible that we could end up with blacklisted journal
sequence numbers newer than the newest seq we find in the journal, and
pick a new start seq that's blacklisted.
Explicitly check for this in bch2_fs_journal_start().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We don't want to change the bucket gen, on gen mismatch: it's possible
to have multiple btree nodes with different gens in the same bucket that
we want to keep, if we have to recover from btree node scan.
It's also not necessary to set g->gen_valid; add a comment to that
effect.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
This was lost in the giant recovery pass rework - but it's used heavily
by bcachefs subcommand utilities.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
When we go to allocate and find taht a bucket in the freespace btree is
actually allocated, we're supposed to return nonzero to tell the
allocator to skip it.
This fixes an emergency read only due to a bucket/ptr gen mismatch - we
also don't return the correct bucket gen when this happens.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fixes: 010c89468134 ("bcachefs: Check for casefolded dirents in non casefolded dirs")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Pull smb server fixes from Steve French:
- Fix for rename regression due to the recent VFS lookup changes
- Fix write failure
- locking fix for oplock handling
* tag 'v6.15-rc8-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: use list_first_entry_or_null for opinfo_get_list()
ksmbd: fix rename failure
ksmbd: fix stream write failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a small set of fixes for the blocking buffer lookup
conversion done earlier this cycle.
It adds a missing conversion in the getblk slowpath and a few minor
optimizations and cleanups"
* tag 'vfs-6.15-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs/buffer: optimize discard_buffer()
fs/buffer: remove superfluous statements
fs/buffer: avoid redundant lookup in getblk slowpath
fs/buffer: use sleeping lookup in __getblk_slowpath()
|
|
According to commit 8f6116b5b77b ("statmount: add a new supported_mask
field"), STATMOUNT_SUPPORTED macro shall be updated whenever a new flag
is added.
Fixes: 7a54947e727b ("Merge patch series "fs: allow changing idmappings"")
Signed-off-by: "Dmitry V. Levin" <ldv@strace.io>
Link: https://lore.kernel.org/20250511224953.GA17849@strace.io
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Not since 8f2918898eb5 "new helpers: vfs_create_mount(), fc_mount()"
back in 2018. Get rid of the dead checks...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lore.kernel.org/20250421033509.GV2023217@ZenIV
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add some comments in there highlighting a few non-obvious assumptions.
Link: https://lore.kernel.org/20250416-zerknirschen-aluminium-14a55639076f@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
If path->should_be_locked is true, that means user code (of the btree
API) has seen, in this transaction, something guarded by the node this
path has locked, and we have to keep it locked until the end of the
transaction.
Assert that we're not violating this; should_be_locked should also be
cleared only in _very_ special situations.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Simplify the "do we need to keep this locked?" checks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're adding new should_be_locked assertions: it's going to be illegal
to unlock a should_be_locked path when trans->locked is true.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
We're adding new should_be_locked assertions, also add a comment
explaining why clearing should_be_locked is safe here.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Small additional optimization over the previous patch, bringing us
closer to the original behaviour, except when we need to clone to avoid
a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Avoid transaction restarts due to failure to upgrade - we can traverse a
new iterator without a transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
btree_path_get_locks, on failure, shouldn't unlock if we're not issuing
a transaction restart: we might drop locks we're not supposed to (if
path->should_be_locked is set).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Small helper to improve locking assertions.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
bch2_path_put_nokeep() was intended for paths we wouldn't need to
preserve for a transaction restart - it always frees them right away
when the ref hits 0.
But since paths are shared, freeing unconditionally is a bug, the path
might have been used elsewhere and have should_be_locked set, i.e. we
need to keep it locked until the end of the transaction.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
On cifs, "DIO reads" (specified by O_DIRECT) need to be differentiated from
"unbuffered reads" (specified by cache=none in the mount parameters). The
difference is flagged in the protocol and the server may behave
differently: Windows Server will, for example, mandate that DIO reads are
block aligned.
Fix this by adding a NETFS_UNBUFFERED_READ to differentiate this from
NETFS_DIO_READ, parallelling the write differentiation that already exists.
cifs will then do the right thing.
Fixes: 016dc8516aec ("netfs: Implement unbuffered/DIO read support")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/3444961.1747987072@warthog.procyon.org.uk
Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
cc: Steve French <sfrench@samba.org>
cc: netfs@lists.linux.dev
cc: v9fs@lists.linux.dev
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Pull bcachefs fixes from Kent Overstreet:
"Small stuff, main ones users will be interested in:
- Couple more casefolding fixes; we can now detect and repair
casefolded dirents in non-casefolded dir and vice versa
- Fix for massive write inflation with mmapped io, which hit certain
databases"
* tag 'bcachefs-2025-05-22' of git://evilpiepirate.org/bcachefs:
bcachefs: Check for casefolded dirents in non casefolded dirs
bcachefs: Fix bch2_dirent_create_snapshot() for casefolding
bcachefs: Fix casefold opt via xattr interface
bcachefs: mkwrite() now only dirties one page
bcachefs: fix extent_has_stripe_ptr()
bcachefs: Fix bch2_btree_path_traverse_cached() when paths realloced
|
|
Get rid of useless `goto`s. No logic changes.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250522084953.412096-1-hsiangkao@linux.alibaba.com
|
|
Pull smb client fixes from Steve French:
- Two fixes for use after free in readdir code paths
* tag '6.15-rc8-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Reset all search buffer pointers when releasing buffer
smb: client: Fix use-after-free in cifs_fill_dirent
|
|
We need to delay checksumming the journal write; we don't know the
blocksize until after we allocate the write.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Separate tracepoint message generation and other slowpath code into
non-inline functions, and use bch2_trans_log_str() instead of using a
printbuf for our journal message.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
The data update path doesn't need a printbuf for its log message - this
will help reduce stack usage.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Reduce stack usage - bkey_buf has a 96 byte buffer on the stack, but the
btree_trans bump allocator works just fine here.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Fuzzing hit another invalid pointer dereference due to the lack of
checking whether jffs2_prealloc_raw_node_refs() completed successfully.
Subsequent logic implies that the node refs have been allocated.
Handle that. The code is ready for propagating the error upwards.
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 1 PID: 5835 Comm: syz-executor145 Not tainted 5.10.234-syzkaller #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:jffs2_link_node_ref+0xac/0x690 fs/jffs2/nodelist.c:600
Call Trace:
jffs2_mark_erased_block fs/jffs2/erase.c:460 [inline]
jffs2_erase_pending_blocks+0x688/0x1860 fs/jffs2/erase.c:118
jffs2_garbage_collect_pass+0x638/0x1a00 fs/jffs2/gc.c:253
jffs2_reserve_space+0x3f4/0xad0 fs/jffs2/nodemgmt.c:167
jffs2_write_inode_range+0x246/0xb50 fs/jffs2/write.c:362
jffs2_write_end+0x712/0x1110 fs/jffs2/file.c:302
generic_perform_write+0x2c2/0x500 mm/filemap.c:3347
__generic_file_write_iter+0x252/0x610 mm/filemap.c:3465
generic_file_write_iter+0xdb/0x230 mm/filemap.c:3497
call_write_iter include/linux/fs.h:2039 [inline]
do_iter_readv_writev+0x46d/0x750 fs/read_write.c:740
do_iter_write+0x18c/0x710 fs/read_write.c:866
vfs_writev+0x1db/0x6a0 fs/read_write.c:939
do_pwritev fs/read_write.c:1036 [inline]
__do_sys_pwritev fs/read_write.c:1083 [inline]
__se_sys_pwritev fs/read_write.c:1078 [inline]
__x64_sys_pwritev+0x235/0x310 fs/read_write.c:1078
do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x67/0xd1
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 2f785402f39b ("[JFFS2] Reduce visibility of raw_node_ref to upper layers of JFFS2 code.")
Fixes: f560928baa60 ("[JFFS2] Allocate node_ref for wasted space when skipping to page boundary")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Syzkaller detected a kernel bug in jffs2_link_node_ref, caused by fault
injection in jffs2_prealloc_raw_node_refs. jffs2_sum_write_sumnode doesn't
check return value of jffs2_prealloc_raw_node_refs and simply lets any
error propagate into jffs2_sum_write_data, which eventually calls
jffs2_link_node_ref in order to link the summary to an expectedly allocated
node.
kernel BUG at fs/jffs2/nodelist.c:592!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 31277 Comm: syz-executor.7 Not tainted 6.1.128-syzkaller-00139-ge10f83ca10a1 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:jffs2_link_node_ref+0x570/0x690 fs/jffs2/nodelist.c:592
Call Trace:
<TASK>
jffs2_sum_write_data fs/jffs2/summary.c:841 [inline]
jffs2_sum_write_sumnode+0xd1a/0x1da0 fs/jffs2/summary.c:874
jffs2_do_reserve_space+0xa18/0xd60 fs/jffs2/nodemgmt.c:388
jffs2_reserve_space+0x55f/0xaa0 fs/jffs2/nodemgmt.c:197
jffs2_write_inode_range+0x246/0xb50 fs/jffs2/write.c:362
jffs2_write_end+0x726/0x15d0 fs/jffs2/file.c:301
generic_perform_write+0x314/0x5d0 mm/filemap.c:3856
__generic_file_write_iter+0x2ae/0x4d0 mm/filemap.c:3973
generic_file_write_iter+0xe3/0x350 mm/filemap.c:4005
call_write_iter include/linux/fs.h:2265 [inline]
do_iter_readv_writev+0x20f/0x3c0 fs/read_write.c:735
do_iter_write+0x186/0x710 fs/read_write.c:861
vfs_iter_write+0x70/0xa0 fs/read_write.c:902
iter_file_splice_write+0x73b/0xc90 fs/splice.c:685
do_splice_from fs/splice.c:763 [inline]
direct_splice_actor+0x10c/0x170 fs/splice.c:950
splice_direct_to_actor+0x337/0xa10 fs/splice.c:896
do_splice_direct+0x1a9/0x280 fs/splice.c:1002
do_sendfile+0xb13/0x12c0 fs/read_write.c:1255
__do_sys_sendfile64 fs/read_write.c:1323 [inline]
__se_sys_sendfile64 fs/read_write.c:1309 [inline]
__x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Fix this issue by checking return value of jffs2_prealloc_raw_node_refs
before calling jffs2_sum_write_data.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Cc: stable@vger.kernel.org
Fixes: 2f785402f39b ("[JFFS2] Reduce visibility of raw_node_ref to upper layers of JFFS2 code.")
Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
s/much/many/
Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Cross-merge networking fixes after downstream PR (net-6.15-rc8).
Conflicts:
80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X")
4bcc063939a5 ("ice, irdma: fix an off by one in error handling code")
c24a65b6a27c ("iidc/ice/irdma: Update IDC to support multiple consumers")
https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au
No extra adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We are no longer calling gfs2_find_jhead() on the same log twice, so
there is no more reason for keeping the log contents cached across those
calls. In addition, log head lookup and log header writing didn't go
through the same address space and so the caching wasn't even fully
working, anyway.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently at mount time, the recovery code looks up the current log head
and, if necessary, replays the log and writes a recovery header to
indicate that the log is clean. It does that for each log that may need
recovery. We also know that our own log will always be checked as part
of that process. Then, the mount code looks up the log head of our own
log again.
The double log head lookup can be costly, but more importantly, it is
unnecessary because we can trivially compute the position of the log
head after recovery; all we need to do for that is bump the position and
lh_sequence by one when writing a recovery header.
With that in mind, move the call to gfs2_log_pointers_init() into
gfs2_recover_func() and get rid of the double lookup in
gfs2_make_fs_rw().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In function clean_journal(), update @head to point at the log header
that indicates successful recovery: this is where logging needs to
resume.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move the initialization of sdp->sd_log_sequence and
sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use
gfs2_replay_incr_blk().
Before this change, the log head lookup code in freeze_go_xmote_bh()
didn't update sdp->sd_log_flush_head. This is now fixed, but the code
in freeze_go_xmote_bh() appears to be pretty useless in the first place:
on a frozen filesystem, the log head will not change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move gfs2_log_pointers_init to recovery.c: there is no need for inlining
this function.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 40829760096df ("gfs2: Convert gfs2_find_jhead() to use a folio")
replaced grab_cache_page() by filemap_grab_folio(), but the comments
were still referring to grab_cache_page().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()") changed the log
flush code in gfs2_ail1_start_one() to call aops->writepages() instead
of aops->writepage(). For jdata inodes, this means that we will now try
to reserve log space and start a transaction before we can determine
that the pages in question have already been journaled. When this
happens in the context of gfs2_logd(), it can now appear that not enough
log space is available for freeing up log space, and we will lock up.
Fix that by issuing journal writes directly instead of going through
aops->writepages() in the log flush code.
Fixes: 8d391972ae2d ("gfs2: Remove __gfs2_writepage()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move gfs2_trans_add_databufs() to trans.c. Pass in a glock instead of
a gfs2_inode.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
__get_log_header() was using crc32_le_shift() to update a CRC with four
zero bytes. However, this is about 5x slower than just CRC'ing four
zero bytes in the normal way. Just do that instead.
(We could instead make crc32_le_shift() faster on short lengths. But
all its callers do just fine without it, so I'd like to just remove it.)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Since commit eb65540aa9fc ("iomap: warn on zero range of a post-eof
folio"), iomap_zero_range() warns when asked to zero a folio beyond eof.
The warning triggers on the following code path:
gfs2_fallocate(FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE)
__gfs2_punch_hole()
gfs2_block_zero_range()
iomap_zero_range()
In __gfs2_punch_hole(), gfs2 zeroes out partial folios at the beginning
and at the end of the specified range, whether those folios are beyond
eof or not. This may add folios to the page cache which are entirely
beyond eof, which isn't of any use. Avoid that by truncating the range
to zero out at eof.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
__gfs2_jdata_write_folio can't return AOP_WRITEPAGE_ACTIVATE, so don't
check for it in gfs2_write_jdata_batch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When attempting to use an archive file, such as APEX on android,
as a file-backed mount source, it fails because EROFS image within
the archive file does not start at offset 0. As a result, a loop
or a dm device is still needed to attach the image file at an
appropriate offset first. Similarly, if an EROFS image within a
block device does not start at offset 0, it cannot be mounted
directly either.
To address this issue, this patch adds a new mount option `fsoffset=x'
to accept a start offset for the primary device. The offset should be
aligned to the block size. EROFS will add this offset before performing
read requests.
Signed-off-by: Sheng Yong <shengyong1@xiaomi.com>
Signed-off-by: Wang Shuai <wangshuai12@xiaomi.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20250517090544.2687651-1-shengyong1@xiaomi.com
[ Gao Xiang: minor update on documentation and the error message. ]
Reviewed-by: Hongbo Li <lihongbo22@huawei.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
The list_first_entry() macro never returns NULL. If the list is
empty then it returns an invalid pointer. Use list_first_entry_or_null()
to check if the list is empty.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202505080231.7OXwq4Te-lkp@intel.com/
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
|