Age | Commit message (Collapse) | Author |
|
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- don't inherit NFS filesystem capabilities when crossing from one
filesystem to another
Bugfixes:
- NFS wakeup of __nfs_lookup_revalidate() needs memory barriers
- NFS improve bounds checking in nfs_fh_to_dentry()
- NFS Fix allocation errors when writing to a NFS file backed
loopback device
- NFSv4: More listxattr fixes
- SUNRPC: fix client handling of TLS alerts
- pNFS block/scsi layout fix for an uninitialised pointer
dereference
- pNFS block/scsi layout fixes for the extent encoding, stripe
mapping, and disk offset overflows
- pNFS layoutcommit work around for RPC size limitations
- pNFS/flexfiles avoid looping when handling fatal errors after
layoutget
- localio: fix various race conditions
Features and cleanups:
- Add NFSv4 support for retrieving the btime
- NFS: Allow folio migration for the case of mode == MIGRATE_SYNC
- NFS: Support using a kernel keyring to store TLS certificates
- NFSv4: Speed up delegation lookup using a hash table
- Assorted cleanups to remove unused variables and struct fields
- Assorted new tracepoints to improve debugging"
* tag 'nfs-for-6.17-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (44 commits)
NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file
NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()
NFS/localio: nfs_close_local_fh() fix check for file closed
NFSv4: Remove duplicate lookups, capability probes and fsinfo calls
NFS: Fix the setting of capabilities when automounting a new filesystem
sunrpc: fix client side handling of tls alerts
nfs/localio: use read_seqbegin() rather than read_seqbegin_or_lock()
NFS: Fixup allocation flags for nfsiod's __GFP_NORETRY
NFSv4.2: another fix for listxattr
NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
SUNRPC: Silence warnings about parameters not being described
NFS: Clean up pnfs_put_layout_hdr()/pnfs_destroy_layout_final()
NFS: Fix wakeup of __nfs_lookup_revalidate() in unblock_revalidate()
NFS: use a hash table for delegation lookup
NFS: track active delegations per-server
NFS: move the delegation_watermark module parameter
NFS: cleanup nfs_inode_reclaim_delegation
NFS: cleanup error handling in nfs4_server_common_setup
pNFS/flexfiles: don't attempt pnfs on fatal DS errors
NFS: drop __exit from nfs_exit_keyring
...
|
|
Page pool can have pages "directly" (locklessly) recycled to it,
if the NAPI that owns the page pool is scheduled to run on the same CPU.
To make this safe we check that the NAPI is disabled while we destroy
the page pool. In most cases NAPI and page pool lifetimes are tied
together so this happens naturally.
The queue API expects the following order of calls:
-> mem_alloc
alloc new pp
-> stop
napi_disable
-> start
napi_enable
-> mem_free
free old pp
Here we allocate the page pool in ->mem_alloc and free in ->mem_free.
But the NAPIs are only stopped between ->stop and ->start. We created
page_pool_disable_direct_recycling() to safely shut down the recycling
in ->stop. This way the page_pool_destroy() call in ->mem_free doesn't
have to worry about recycling any more.
Unfortunately, the page_pool_disable_direct_recycling() is not enough
to deal with failures which necessitate freeing the _new_ page pool.
If we hit a failure in ->mem_alloc or ->stop the new page pool has
to be freed while the NAPI is active (assuming driver attaches the
page pool to an existing NAPI instance and doesn't reallocate NAPIs).
Freeing the new page pool is technically safe because it hasn't been
used for any packets, yet, so there can be no recycling. But the check
in napi_assert_will_not_race() has no way of knowing that. We could
check if page pool is empty but that'd make the check much less likely
to trigger during development.
Add page_pool_enable_direct_recycling(), pairing with
page_pool_disable_direct_recycling(). It will allow us to create the new
page pools in "disabled" state and only enable recycling when we know
the reconfig operation will not fail.
Coincidentally it will also let us re-enable the recycling for the old
pool, if the reconfig failed:
-> mem_alloc (new)
-> stop (old)
# disables direct recycling for old
-> start (new)
# fail!!
-> start (old)
# go back to old pp but direct recycling is lost :(
-> mem_free (new)
The new helper is idempotent to make the life easier for drivers,
which can operate in HDS mode and support zero-copy Rx.
The driver can call the helper twice whether there are two pools
or it has multiple references to a single pool.
Fixes: 40eca00ae605 ("bnxt_en: unlink page pool when stopping Rx queue")
Tested-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250805003654.2944974-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Define a new, optional, callback that allows the driver to
specify how the return data buffer is allocated. If that callback
is set, mailbox/pcc.c is now responsible for reading from and
writing to the PCC shared buffer.
This also allows for proper checks of the Commnand complete flag
between the PCC sender and receiver.
For Type 4 channels, initialize the command complete flag prior
to accepting messages.
Since the mailbox does not know what memory allocation scheme
to use for response messages, the client now has an optional
callback that allows it to allocate the buffer for a response
message.
When an outbound message is written to the buffer, the mailbox
checks for the flag indicating the client wants an tx complete
notification via IRQ. Upon receipt of the interrupt It will
pair it with the outgoing message. The expected use is to
free the kernel memory buffer for the previous outgoing message.
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>
Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
Previous releases - regressions:
- netlink: avoid infinite retry looping in netlink_unicast()
Previous releases - always broken:
- packet: fix a race in packet_set_ring() and packet_notifier()
- ipv6: reject malicious packets in ipv6_gso_segment()
- sched: mqprio: fix stack out-of-bounds write in tc entry parsing
- net: drop UFO packets (injected via virtio) in udp_rcv_segment()
- eth: mlx5: correctly set gso_segs when LRO is used, avoid false
positive checksum validation errors
- netpoll: prevent hanging NAPI when netcons gets enabled
- phy: mscc: fix parsing of unicast frames for PTP timestamping
- a number of device tree / OF reference leak fixes"
* tag 'net-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (44 commits)
pptp: fix pptp_xmit() error path
net: ti: icssg-prueth: Fix skb handling for XDP_PASS
net: Update threaded state in napi config in netif_set_threaded
selftests: netdevsim: Xfail nexthop test on slow machines
eth: fbnic: Lock the tx_dropped update
eth: fbnic: Fix tx_dropped reporting
eth: fbnic: remove the debugging trick of super high page bias
net: ftgmac100: fix potential NULL pointer access in ftgmac100_phy_disconnect
dt-bindings: net: Replace bouncing Alexandru Tachici emails
dpll: zl3073x: ZL3073X_I2C and ZL3073X_SPI should depend on NET
net/sched: mqprio: fix stack out-of-bounds write in tc entry parsing
Revert "net: mdio_bus: Use devm for getting reset GPIO"
selftests: net: packetdrill: xfail all problems on slow machines
net/packet: fix a race in packet_set_ring() and packet_notifier()
benet: fix BUG when creating VFs
net: airoha: npu: Add missing MODULE_FIRMWARE macros
net: devmem: fix DMA direction on unmapping
ipa: fix compile-testing with qcom-mdt=m
eth: fbnic: unlink NAPIs from queues on error to open
net: Add locking to protect skb->dev access in ip_output
...
|
|
Because it's only used in sbitmap.c
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250807032413.1469456-3-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb->shift, with the respect that sb->shift is the available tags in one
word.
However, sb->shift is not the availbale tags in the last word, see
__map_depth:
if (index == sb->map_nr - 1)
return sb->depth - (index << sb->shift);
For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.
One the ohter hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.
Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this,
a new helper __map_depth_with_shallow() is introduced to calculate
available bits in each word.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20250807032413.1469456-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The conversion of all GPIO drivers to using the .set_rv() and
.set_multiple_rv() callbacks from struct gpio_chip (which - unlike their
predecessors - return an integer and allow the controller drivers to
indicate failures to users) is now complete and the legacy ones have
been removed. Rename the new callbacks back to their original names in
one sweeping change.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
With no more users of the legacy GPIO line value setters - .set() and
.set_multiple() - we can now remove them from the kernel.
Link: https://lore.kernel.org/r/20250725074651.14002-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- updates to several drivers consuming GPIO APIs to use setters
returning error codes
- an infrastructure allowing to define "overlays" for touchscreens
carving out regions implementing buttons and other elements from a
bigger sensors and a corresponding update to st1232 driver
- an update to AT/PS2 keyboard driver to map F13-F24 by default
- Samsung keypad driver got a facelift
- evdev input handler will now bind to all devices using EV_SYN event
instead of abusing id->driver_info
- two new sub-drivers implementing 1A (capacitive buttons) and 21
(forcepad button) functions in Synaptics RMI driver
- support for polling mode in Goodix touchscreen driver
- support for support for FocalTech FT8716 in edt-ft5x06 driver
- support for MT6359 in mtk-pmic-keys driver
- removal of pcf50633-input driver since platform it was used on is
gone
- new definitions for game controller "grip" buttons (BTN_GRIP*) and
corresponding changes to xpad and hid-steam controller drivers
- a new definition for "performance" key
* tag 'input-for-v6.17-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (38 commits)
HID: hid-steam: Use new BTN_GRIP* buttons
Input: add keycode for performance mode key
Input: max77693 - convert to atomic pwm operation
Input: st1232 - add touch-overlay handling
dt-bindings: input: touchscreen: st1232: add touch-overlay example
Input: touch-overlay - add touchscreen overlay handling
dt-bindings: touchscreen: add touch-overlay property
Input: atkbd - correctly map F13 - F24
Input: xpad - use new BTN_GRIP* buttons
Input: Add and document BTN_GRIP*
Input: xpad - change buttons the D-Pad gets mapped as to BTN_DPAD_*
Documentation: Fix capitalization of XBox -> Xbox
Input: synaptics-rmi4 - add support for F1A
dt-bindings: input: syna,rmi4: Document F1A function
Input: synaptics-rmi4 - add support for Forcepads (F21)
Input: mtk-pmic-keys - add support for MT6359 PMIC keys
Input: remove special handling of id->driver_info when matching
Input: evdev - switch matching to EV_SYN
Input: samsung-keypad - use BIT() and GENMASK() where appropriate
Input: samsung-keypad - use per-chip parameters
...
|
|
Pull VFIO updates from Alex Williamson:
- Fix imbalance where the no-iommu/cdev device path skips too much on
open, failing to increment a reference, but still decrements the
reference on close. Add bounds checking to prevent such underflows
(Jacob Pan)
- Fill missing detach_ioas op for pds_vfio_pci, fixing probe failure
when used with IOMMUFD (Brett Creeley)
- Split SR-IOV VFs to separate dev_set, avoiding unnecessary
serialization between VFs that appear on the same bus (Alex
Williamson)
- Fix a theoretical integer overflow is the mlx5-vfio-pci variant
driver (Artem Sadovnikov)
- Implement missing VF token checking support via vfio cdev/IOMMUFD
interface (Jason Gunthorpe)
- Update QAT vfio-pci variant driver to claim latest VF devices
(Małgorzata Mielnik)
- Add a cond_resched() call to avoid holding the CPU too long during
DMA mapping operations (Keith Busch)
* tag 'vfio-v6.17-rc1-v2' of https://github.com/awilliam/linux-vfio:
vfio/type1: conditional rescheduling while pinning
vfio/qat: add support for intel QAT 6xxx virtual functions
vfio/qat: Remove myself from VFIO QAT PCI driver maintainers
vfio/pci: Do vf_token checks for VFIO_DEVICE_BIND_IOMMUFD
vfio/mlx5: fix possible overflow in tracking max message size
vfio/pci: Separate SR-IOV VF dev_set
vfio/pds: Fix missing detach_ioas op
vfio: Prevent open_count decrement to negative
vfio: Fix unbalanced vfio_df_close call in no-iommu mode
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.17
This is a relatively small set of fixes and device quirks that came in
during the merge window, the AMD changes adding support for ACP 7.2
systems are all just adding IDs for the devices rather than any
substantial code - the actual code is the same as for prior versions of
the platform.
|
|
Pull more SCSI updates from James Bottomley:
"This is mostly fixes and cleanups and code reworks that trickled in
across the merge window and the weeks leading up. The only substantive
update is the Mediatek ufs driver which accounts for the bulk of the
additions"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (37 commits)
scsi: libsas: Use a bool for sas_deform_port() second argument
scsi: libsas: Move declarations of internal functions to sas_internal.h
scsi: libsas: Make sas_get_ata_info() static
scsi: libsas: Simplify sas_ata_wait_eh()
scsi: libsas: Refactor dev_is_sata()
scsi: sd: Make sd shutdown issue START STOP UNIT appropriately
scsi: arm64: dts: mediatek: mt8195: Add UFSHCI node
scsi: dt-bindings: mediatek,ufs: add MT8195 compatible and update clock nodes
scsi: dt-bindings: mediatek,ufs: Add ufs-disable-mcq flag for UFS host
scsi: ufs: ufs-mediatek: Add UFS host support for MT8195 SoC
scsi: ufs: ufs-pci: Remove control of UIC Completion interrupt for Intel MTL
scsi: ufs: core: Do not write interrupt enable register unnecessarily
scsi: ufs: core: Set and clear UIC Completion interrupt as needed
scsi: ufs: core: Remove duplicated code in ufshcd_send_bsg_uic_cmd()
scsi: ufs: core: Move ufshcd_enable_intr() and ufshcd_disable_intr()
scsi: ufs: ufs-pci: Remove UFS PCI driver's ->late_init() call back
scsi: ufs: ufs-pci: Fix default runtime and system PM levels
scsi: ufs: ufs-pci: Fix hibernate state transition for Intel MTL-like host controllers
scsi: ufs: host: mediatek: Support FDE (AES) clock scaling
scsi: ufs: host: mediatek: Support clock scaling with Vcore binding
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
Pull ata fixes from Damien Le Moal:
- Cleanup whitespace in messages in libata-core and the pata_pdc2027x,
pata_macio drivers (Colin)
- Fix ata_to_sense_error() to avoid seeing nonsensical sense data for
rare cases where we fail to get sense data from the drive. The
complementary fix to this is to ensure that we always return the
generic "ABORTED COMMAND" sense data for a failed command for which
we have no status or error fields
- The recent changes to link power management (LPM) which now prevent
the user from attempting to set an LPM policy through the
link_power_management_policy caused some regressions in test
environments because of the error that is now returned when writing
to that attribute when LPM is not supported. To allow users to not
trip on this, introduce the new link_power_management_supported
attribute to allow simple testing of a port/device LPM support (me)
* tag 'ata-6.17-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: pata_pdc2027x: Remove space before newline and abbreviations
ata: pata_macio: Remove space before newline
ata: libata-core: Remove space before newline
ata: libata-sata: Add link_power_management_supported sysfs attribute
ata: libata-scsi: Return aborted command when missing sense and result TF
ata: libata-scsi: Fix ata_to_sense_error() status handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
"This is the last pull request from me.
I'm grateful to have been able to continue as a maintainer for eight
years. From the next cycle, Nathan and Nicolas will maintain Kbuild.
- Fix a shortcut key issue in menuconfig
- Fix missing rebuild of kheaders
- Sort the symbol dump generated by gendwarfsyms
- Support zboot extraction in scripts/extract-vmlinux
- Migrate gconfig to GTK 3
- Add TAR variable to allow overriding the default tar command
- Hand over Kbuild maintainership"
* tag 'kbuild-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (92 commits)
MAINTAINERS: hand over Kbuild maintenance
kheaders: make it possible to override TAR
kbuild: userprogs: use correct linker when mixing clang and GNU ld
kconfig: lxdialog: replace strcpy() with strncpy() in inputbox.c
kconfig: lxdialog: replace strcpy with snprintf in print_autowrap
kconfig: gconf: refactor text_insert_help()
kconfig: gconf: remove unneeded variable in text_insert_msg
kconfig: gconf: use hyphens in signals
kconfig: gconf: replace GtkImageMenuItem with GtkMenuItem
kconfig: gconf: Fix Back button behavior
kconfig: gconf: fix single view to display dependent symbols correctly
scripts: add zboot support to extract-vmlinux
gendwarfksyms: order -T symtypes output by name
gendwarfksyms: use preferred form of sizeof for allocation
kconfig: qconf: confine {begin,end}Group to constructor and destructor
kconfig: qconf: fix ConfigList::updateListAllforAll()
kconfig: add a function to dump all menu entries in a tree-like format
kconfig: gconf: show GTK version in About dialog
kconfig: gconf: replace GtkHPaned and GtkVPaned with GtkPaned
kconfig: gconf: replace GdkColor with GdkRGBA
...
|
|
This was missed during the initial implementation. The VFIO PCI encodes
the vf_token inside the device name when opening the device from the group
FD, something like:
"0000:04:10.0 vf_token=bd8d9d2b-5a5f-4f5a-a211-f591514ba1f3"
This is used to control access to a VF unless there is co-ordination with
the owner of the PF.
Since we no longer have a device name in the cdev path, pass the token
directly through VFIO_DEVICE_BIND_IOMMUFD using an optional field
indicated by VFIO_DEVICE_BIND_FLAG_TOKEN.
Fixes: 5fcc26969a16 ("vfio: Add VFIO_DEVICE_BIND_IOMMUFD")
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/0-v3-bdd8716e85fe+3978a-vfio_token_jgg@nvidia.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|
Alienware calls this key "Performance Boost". Dell calls it "G-Mode".
The goal is to have a specific keycode to detect when this key is
pressed, so userspace can act upon it and do what have to do, usually
starting the power profile for performance.
Signed-off-by: Marcos Alano <marcoshalano@gmail.com>
Link: https://lore.kernel.org/r/20250509193708.2190586-1-marcoshalano@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "mseal cleanups" (Lorenzo Stoakes)
Some mseal cleaning with no intended functional change.
- "Optimizations for khugepaged" (David Hildenbrand)
Improve khugepaged throughput by batching PTE operations for large
folios. This gain is mainly for arm64.
- "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes" (Mike Rapoport)
A bugfix, additional debug code and cleanups to the execmem code.
- "mm/shmem, swap: bugfix and improvement of mTHP swap in" (Kairui Song)
Bugfixes, cleanups and performance improvememnts to the mTHP swapin
code"
* tag 'mm-stable-2025-08-03-12-35' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (38 commits)
mm: mempool: fix crash in mempool_free() for zero-minimum pools
mm: correct type for vmalloc vm_flags fields
mm/shmem, swap: fix major fault counting
mm/shmem, swap: rework swap entry and index calculation for large swapin
mm/shmem, swap: simplify swapin path and result handling
mm/shmem, swap: never use swap cache and readahead for SWP_SYNCHRONOUS_IO
mm/shmem, swap: tidy up swap entry splitting
mm/shmem, swap: tidy up THP swapin checks
mm/shmem, swap: avoid redundant Xarray lookup during swapin
x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations
x86/kprobes: enable EXECMEM_ROX_CACHE for kprobes allocations
execmem: drop writable parameter from execmem_fill_trapping_insns()
execmem: add fallback for failures in vmalloc(VM_ALLOW_HUGE_VMAP)
execmem: move execmem_force_rw() and execmem_restore_rox() before use
execmem: rework execmem_cache_free()
execmem: introduce execmem_alloc_rw()
execmem: drop unused execmem_update_copy()
mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped
mm/rmap: add anon_vma lifetime debug check
mm: remove mm/io-mapping.c
...
|
|
Provide documentation for the drm_bridge callbacks related to the
DRM_BRIDGE_OP_HDMI_CEC_ADAPTER flag.
Fixes: a74288c8ded7 ("drm/display: bridge-connector: handle CEC adapters")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/r/20250611140933.1429a1b8@canb.auug.org.au
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://lore.kernel.org/r/20250801-drm-hdmi-cec-docs-v1-1-be63e6008d0e@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"Three main updates: folio conversion by Matthew, switch to a new mount
API by Hongbo and Eric, and several sysfs entries to tune GCs for ZUFS
with finer granularity by Daeho.
There are also patches to address bugs and issues in the existing
features such as GCs, file pinning, write-while-dio-read, contingous
block allocation, and memory access violations.
Enhancements:
- switch to new mount API and folio conversion
- add sysfs nodes to controle F2FS GCs for ZUFS
- improve performance on the nat entry cache
- drop inode from the donation list when the last file is closed
- avoid splitting bio when reading multiple pages
Bug fixes:
- fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
- make sure zoned device GC to use FG_GC in shortage of free section
- fix to calculate dirty data during has_not_enough_free_secs()
- fix to update upper_p in __get_secs_required() correctly
- wait for inflight dio completion, excluding pinned files read using dio
- don't break allocation when crossing contiguous sections
- vm_unmap_ram() may be called from an invalid context
- fix to avoid out-of-boundary access in dnode page
- fix to avoid panic in f2fs_evict_inode
- fix to avoid UAF in f2fs_sync_inode_meta()
- fix to use f2fs_is_valid_blkaddr_raw() in do_write_page()
- fix UAF of f2fs_inode_info in f2fs_free_dic
- fix to avoid invalid wait context issue
- fix bio memleak when committing super block
- handle nat.blkaddr corruption in f2fs_get_node_info()
In addition, there are also clean-ups and minor bug fixes"
* tag 'f2fs-for-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (109 commits)
f2fs: drop inode from the donation list when the last file is closed
f2fs: add gc_boost_gc_greedy sysfs node
f2fs: add gc_boost_gc_multiple sysfs node
f2fs: fix to trigger foreground gc during f2fs_map_blocks() in lfs mode
f2fs: fix to calculate dirty data during has_not_enough_free_secs()
f2fs: fix to update upper_p in __get_secs_required() correctly
f2fs: directly add newly allocated pre-dirty nat entry to dirty set list
f2fs: avoid redundant clean nat entry move in lru list
f2fs: zone: wait for inflight dio completion, excluding pinned files read using dio
f2fs: ignore valid ratio when free section count is low
f2fs: don't break allocation when crossing contiguous sections
f2fs: remove unnecessary tracepoint enabled check
f2fs: merge the two conditions to avoid code duplication
f2fs: vm_unmap_ram() may be called from an invalid context
f2fs: fix to avoid out-of-boundary access in dnode page
f2fs: switch to the new mount api
f2fs: introduce fs_context_operation structure
f2fs: separate the options parsing and options checking
f2fs: Add f2fs_fs_context to record the mount options
f2fs: Allow sbi to be NULL in f2fs_printk
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Add new "hash_pointers=[auto|always|never]" boot parameter to force
the hashing even with "slab_debug" enabled
- Allow to stop CPU, after losing nbcon console ownership during
panic(), even without proper NMI
- Allow to use the printk kthread immediately even for the 1st
registered nbcon
- Compiler warning removal
* tag 'printk-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: nbcon: Allow reacquire during panic
printk: Allow to use the printk kthread immediately even for 1st nbcon
slab: Decouple slab_debug and no_hash_pointers
vsprintf: Use __diag macros to disable '-Wsuggest-attribute=format'
compiler-gcc.h: Introduce __diag_GCC_all
|
|
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd into next
Merge an immutable branch between MFD, GPIO, Input and PWM to resolve
conflicts for the merge window pull request.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Support for a new RTC in an existing driver and all the drivers
exposing clocks using the common clock framework have been converted
to determine_rate(). Summary:
Subsystem:
- Convert drivers exposing a clock from round_rate() to determine_rate()
Drivers:
- ds1307: oscillator stop flag handling for ds1341
- pcf85063: add support for RV8063"
* tag 'rtc-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (34 commits)
rtc: ds1685: Update Joshua Kinard's email address.
rtc: rv3032: convert from round_rate() to determine_rate()
rtc: rv3028: convert from round_rate() to determine_rate()
rtc: pcf8563: convert from round_rate() to determine_rate()
rtc: pcf85063: convert from round_rate() to determine_rate()
rtc: nct3018y: convert from round_rate() to determine_rate()
rtc: max31335: convert from round_rate() to determine_rate()
rtc: m41t80: convert from round_rate() to determine_rate()
rtc: hym8563: convert from round_rate() to determine_rate()
rtc: ds1307: convert from round_rate() to determine_rate()
rtc: rv3028: fix incorrect maximum clock rate handling
rtc: pcf8563: fix incorrect maximum clock rate handling
rtc: pcf85063: fix incorrect maximum clock rate handling
rtc: nct3018y: fix incorrect maximum clock rate handling
rtc: hym8563: fix incorrect maximum clock rate handling
rtc: ds1307: fix incorrect maximum clock rate handling
rtc: pcf85063: scope pcf85063_config structures
rtc: Optimize calculations in rtc_time64_to_tm()
dt-bindings: rtc: amlogic,a4-rtc: Add compatible string for C3
rtc: ds1307: handle oscillator stop flag (OSF) for ds1341
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
"Significant patch series in this pull request:
- "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
us closer to being able to remove page->mapping
- "relayfs: misc changes" (Jason Xing) does some maintenance and
minor feature addition work in relayfs
- "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
us from static preallocation of the kdump crashkernel's working
memory over to dynamic allocation. So the difficulty of a-priori
estimation of the second kernel's needs is removed and the first
kernel obtains extra memory
- "generalize panic_print's dump function to be used by other
kernel parts" (Feng Tang) implements some consolidation and
rationalization of the various ways in which a failing kernel
splats information at the operator
* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
tools/getdelays: add backward compatibility for taskstats version
kho: add test for kexec handover
delaytop: enhance error logging and add PSI feature description
samples: Kconfig: fix spelling mistake "instancess" -> "instances"
fat: fix too many log in fat_chain_add()
scripts/spelling.txt: add notifer||notifier to spelling.txt
xen/xenbus: fix typo "notifer"
net: mvneta: fix typo "notifer"
drm/xe: fix typo "notifer"
cxl: mce: fix typo "notifer"
KVM: x86: fix typo "notifer"
MAINTAINERS: add maintainers for delaytop
ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
ucount: fix atomic_long_inc_below() argument type
kexec: enable CMA based contiguous allocation
stackdepot: make max number of pools boot-time configurable
lib/xxhash: remove unused functions
init/Kconfig: restore CONFIG_BROKEN help text
lib/raid6: update recov_rvv.c zero page usage
docs: update docs after introducing delaytop
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
- Remove unneeded goto out statements
Over time, the logic was restructured but left a "goto out" where the
out label simply did a "return ret;". Instead of jumping to this out
label, simply return immediately and remove the out label.
- Add guard(ring_buffer_nest)
Some calls to the tracing ring buffer can happen when the ring buffer
is already being written to at the same context (for example, a
trace_printk() in between a ring_buffer_lock_reserve() and a
ring_buffer_unlock_commit()).
In order to not trigger the recursion detection, these functions use
ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard()
for these functions so that their use cases can be simplified and not
need to use goto for the release.
- Clean up the tracing code with guard() and __free() logic
There were several locations that were prime candidates for using
guard() and __free() helpers. Switch them over to use them.
- Fix output of function argument traces for unsigned int values
The function tracer with "func-args" option set will record up to 6
argument registers and then use BTF to format them for human
consumption when the trace file is read. There are several arguments
that are "unsigned long" and even "unsigned int" that are either and
address or a mask. It is easier to understand if they were printed
using hexadecimal instead of decimal. The old method just printed all
non-pointer values as signed integers, which made it even worse for
unsigned integers.
For instance, instead of:
__local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs
show:
__local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs"
* tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have unsigned int function args displayed as hexadecimal
ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)
tracing: Use __free(kfree) in trace.c to remove gotos
tracing: Add guard() around locks and mutexes in trace.c
tracing: Add guard(ring_buffer_nest)
tracing: Remove unneeded goto out logic
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull module updates from Daniel Gomez:
"This is a small set of changes for modules, primarily to extend module
users to use the module data structures in combination with the
already no-op stub module functions, even when support for modules is
disabled in the kernel configuration. This change follows the kernel's
coding style for conditional compilation and allows kunit code to drop
all CONFIG_MODULES ifdefs, which is also part of the changes. This
should allow others part of the kernel to do the same cleanup.
The remaining changes include a fix for module name length handling
which could potentially lead to the removal of an incorrect module,
and various cleanups"
* tag 'modules-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux:
module: Rename MAX_PARAM_PREFIX_LEN to __MODULE_NAME_LEN
tracing: Replace MAX_PARAM_PREFIX_LEN with MODULE_NAME_LEN
module: Restore the moduleparam prefix length check
module: Remove unnecessary +1 from last_unloaded_module::name size
module: Prevent silent truncation of module name in delete_module(2)
kunit: test: Drop CONFIG_MODULE ifdeffery
module: make structure definitions always visible
module: move 'struct module_use' to internal.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c updates from Alexandre Belloni:
"New driver:
- Renesas I3C controller
Subsystem:
- use adapter timeout value for I2C transfers
- don't fail if GETHDRCAP is unsupported
- replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
Drivers:
- svc: Fix npcm845 FIFO_EMPTY quirk"
* tag 'i3c/for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: (25 commits)
i3c: add missing include to internal header
i3c: dw: Remove redundant pm_runtime_mark_last_busy() calls
i3c: master: svc: Remove redundant pm_runtime_mark_last_busy() calls
i3c: master: svc: Fix npcm845 FIFO_EMPTY quirk
i3c: master: Add basic driver for the Renesas I3C controller
dt-bindings: i3c: Add Renesas I3C controller
i3c: Add more parameters for controllers to the header
i3c: Standardize defines for specification parameters
i3c: fix module_i3c_i2c_driver() with I3C=n
i3c: master: cdns: Simplify handling clocks in probe()
i3c: Fix i3c_device_do_priv_xfers() kernel-doc indentation
i3c: master: dw: Use i3c_writel_fifo() and i3c_readl_fifo()
i3c: master: cdns: Use i3c_writel_fifo() and i3c_readl_fifo()
i3c: master: Add inline i3c_readl_fifo() and i3c_writel_fifo()
i3c: prefix hexadecimal entries in sysfs
i3c: master: cdns: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
i3c: dw: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
i3c: master: replace ENOTSUPP with SUSV4-compliant EOPNOTSUPP
i3c: don't fail if GETHDRCAP is unsupported
i3c: add patchwork entry to MAINTAINERS
...
|
|
The lifetime of address handler has been managed by linked list and RCU.
This approach was introduced in commit 35202f7d8420 ("firewire: remove
global lock around address handlers, convert to RCU"). The invocations of
address handler are performed within RCU read-side critical sections.
In commit 57e6d9f85fff ("firewire: ohci: use workqueue to handle events
of AR request/response contexts"), the invocations are in a workqueue
context. The approach still imposes limitation that sleeping is not
allowed within RCU read-side critical sections. However, since sleeping
is not permitted within RCU read-side critical sections, this approach
still has a limitation.
This commit adds reference counting to decouple handler invocation from
handler discovery. The linked list and RCU is used to discover the
handlers, while the reference counting is used to invoke them safely.
Link: https://lore.kernel.org/r/20250803122015.236493-2-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux into block-6.17
Pull MD changes from Yu:
"- mddev null-ptr-dereference fix, by Erkun
- md-cluster fail to remove the faulty disk regression fix, by Heming
- minor cleanup, by Li Nan and Jinchao
- mdadm lifetime regression fix reported by syzkaller, by Yu Kuai"
* tag 'md-6.17-20250803' of gitolite.kernel.org:pub/scm/linux/kernel/git/mdraid/linux:
md: make rdev_addable usable for rcu mode
md/raid1: remove struct pool_info and related code
md/raid1: change r1conf->r1bio_pool to a pointer type
md: rename recovery_cp to resync_offset
md/md-cluster: handle REMOVE message earlier
md: fix create on open mddev lifetime regression
|
|
I am switching my address to a personal domain, so need to update the
driver's files and the entry in MAINTAINERS.
Signed-off-by: Joshua Kinard <kumba@gentoo.org>
Link: https://lore.kernel.org/r/20250721170051.32407-1-kumba@gentoo.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Nothing stands out, apart from maybe the interesting Eswin EIC7700, a
RISC-V SoC I've never seen before.
Core changes:
- Open code PINCTRL_FUNCTION_DESC() instead of defining a complex
macro only used in one place
- Add pinmux_generic_add_pinfunction() helper and use this in a few
drivers
New drivers:
- Amlogic S7, S7D and S6 pin control support
- Eswin EIC7700 pin control support
- Qualcomm PMIV0104, PM7550 and Milos pin control support
Because of unhelpful numbering schemes, the Qualcomm driver now
needs to start to rely on SoC codenames
- STM32 HDP pin control support
- Mediatek MT8189 pin control support
Improvements:
- Switch remaining pin control drivers over to the new GPIO set
callback that provides a return value
- Support RSVD (reserved) pins in the STM32 driver
- Move many fixed assignments over to pinctrl_desc definitions
- Handle multiple TLMM regions in the Qualcomm driver"
* tag 'pinctrl-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (105 commits)
pinctrl: mediatek: Add pinctrl driver for mt8189
dt-bindings: pinctrl: mediatek: Add support for mt8189
pinctrl: aspeed-g6: Add PCIe RC PERST pin group
pinctrl: ingenic: use pinmux_generic_add_pinfunction()
pinctrl: keembay: use pinmux_generic_add_pinfunction()
pinctrl: mediatek: moore: use pinmux_generic_add_pinfunction()
pinctrl: airoha: use pinmux_generic_add_pinfunction()
pinctrl: equilibrium: use pinmux_generic_add_pinfunction()
pinctrl: provide pinmux_generic_add_pinfunction()
pinctrl: pinmux: open-code PINCTRL_FUNCTION_DESC()
pinctrl: ma35: use new GPIO line value setter callbacks
MAINTAINERS: add Clément Le Goffic as STM32 HDP maintainer
pinctrl: stm32: Introduce HDP driver
dt-bindings: pinctrl: stm32: Introduce HDP
pinctrl: qcom: Add Milos pinctrl driver
dt-bindings: pinctrl: document the Milos Top Level Mode Multiplexer
pinctrl: qcom: spmi: Add PM7550
dt-bindings: pinctrl: qcom,pmic-gpio: Add PM7550 support
pinctrl: qcom: spmi: Add PMIV0104
dt-bindings: pinctrl: qcom,pmic-gpio: Add PMIV0104 support
...
|
|
After update of execmem_cache_free() that made memory writable before
updating it, there is no need to update read only memory, so the writable
parameter to execmem_fill_trapping_insns() is not needed. Drop it.
Link: https://lkml.kernel.org/r/20250713071730.4117334-7-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Some callers of execmem_alloc() require the memory to be temporarily
writable even when it is allocated from ROX cache. These callers use
execemem_make_temp_rw() right after the call to execmem_alloc().
Wrap this sequence in execmem_alloc_rw() API.
Link: https://lkml.kernel.org/r/20250713071730.4117334-3-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Daniel Gomez <da.gomez@samsung.com>
Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "x86: enable EXECMEM_ROX_CACHE for ftrace and kprobes", v3.
These patches enable use of EXECMEM_ROX_CACHE for ftrace and kprobes
allocations on x86.
They also include some ground work in execmem.
Since the execmem model for caching large ROX pages changed from the
initial assumption that the memory that is allocated from ROX cache is
always ROX to the current state where memory can be temporarily made RW
and then restored to ROX, we can stop using text poking to update it.
This also saves the hassle of trying lock text_mutex in
execmem_cache_free() when kprobes already hold that mutex.
This patch (of 8):
The execmem_update_copy() that used text poking was required when memory
allocated from ROX cache was always read-only. Since now its permissions
can be switched to read-write there is no need in a function that updates
memory with text poking.
Remove it.
Link: https://lkml.kernel.org/r/20250713071730.4117334-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20250713071730.4117334-2-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+m2tA@mail.gmail.com/
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If an anon folio is mapped into userspace, its anon_vma must be alive,
otherwise rmap walks can hit UAF.
There have been syzkaller reports a few months ago[1][2] of UAF in rmap
walks that seems to indicate that there can be pages with elevated
mapcount whose anon_vma has already been freed, but I think we never
figured out what the cause is; and syzkaller only hit these UAFs when
memory pressure randomly caused reclaim to rmap-walk the affected pages,
so it of course didn't manage to create a reproducer.
Add a VM_WARN_ON_FOLIO() when we add/remove mappings of anonymous folios
to hopefully catch such issues more reliably.
[1] https://lore.kernel.org/r/67abaeaf.050a0220.110943.0041.GAE@google.com
[2] https://lore.kernel.org/r/67a76f33.050a0220.3d72c.0028.GAE@google.com
Link: https://lkml.kernel.org/r/20250725-anonvma-uaf-debug-v2-1-bc3c7e5ba5b1@google.com
Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Harry Yoo <harry.yoo@oracle.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This is dead code, which was used from commit b739f125e4eb ("i915: use
io_mapping_map_user") but reverted a month later by commit 0e4fe0c9f2f9
("Revert "i915: use io_mapping_map_user"") back in 2021.
Since then nobody has used it, so remove it.
[akpm@linux-foundation.org: update Documentation/core-api/mm-api.rst, per Vlastimil]
Link: https://lkml.kernel.org/r/20250725142901.81502-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Optimizations for khugepaged", v4.
If the underlying folio mapped by the ptes is large, we can process those
ptes in a batch using folio_pte_batch().
For arm64 specifically, this results in a 16x reduction in the number of
ptep_get() calls, since on a contig block, ptep_get() on arm64 will
iterate through all 16 entries to collect a/d bits. Next, ptep_clear()
will cause a TLBI for every contig block in the range via
contpte_try_unfold(). Instead, use clear_ptes() to only do the TLBI at
the first and last contig block of the range.
For split folios, there will be no pte batching; the batch size returned
by folio_pte_batch() will be 1. For pagetable split folios, the ptes will
still point to the same large folio; for arm64, this results in the
optimization described above, and for other arches, a minor improvement is
expected due to a reduction in the number of function calls and batching
atomic operations.
This patch (of 3):
Let's add variants to be used where "full" does not apply -- which will
be the majority of cases in the future. "full" really only applies if
we are about to tear down a full MM.
Use get_and_clear_ptes() in existing code, clear_ptes() users will
be added next.
Link: https://lkml.kernel.org/r/20250724052301.23844-2-dev.jain@arm.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mseal cleanups", v4.
Perform a number of cleanups to the mseal logic. Firstly, VM_SEALED is
treated differently from every other VMA flag, it really doesn't make
sense to do this, so we start by making this consistent with everything
else.
Next we place the madvise logic where it belongs - in mm/madvise.c. It
really makes no sense to abstract this elsewhere. In doing so, we go to
great lengths to explain very clearly the previously very confusing logic
as to what sealed mappings are impacted here.
In doing so, we retain existing logic regarding treatment of madvise()
discard operations for a sealed, read-only MAP_PRIVATE file-backed
mapping. This is something we likely need to revisit.
We then abstract out and explain the 'are there are any gaps in this range
in the mm?' check being performed as a prerequisite to mseal being
performed.
Finally, we simplify the actual mseal logic which is really quite
straightforward.
No functional change is intended.
This patch (of 4):
There is no reason to treat VM_SEALED in a special way, in each other case
in which a VMA flag is unavailable due to configuration, we simply assign
that flag to VM_NONE, so make VM_SEALED consistent with all other VMA
flags in this respect.
Additionally, use the next available bit for VM_SEALED, 42, rather than
arbitrarily putting it at 63 and update the declaration to match all other
VMA flags.
No functional change intended.
Link: https://lkml.kernel.org/r/cover.1753431105.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/aeb398a77029b6e7377cd944328bc9bbc3c90537.1753431105.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jeff Xu <jeffxu@chromium.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit cd57b77197a4 ("ext4: Convert ext4_bio_write_page() to use a folio)
removed set_page_writeback_keepwrite() which was the last/only caller of
folio_start_writeback_keepwrite().
Link: https://lkml.kernel.org/r/20250722182230.2114587-1-joannelkoong@gmail.com
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is a spelling mistake of 'notifer' in the comment which
should be 'notifier'.
Link: https://lkml.kernel.org/r/C6633C66376C709A+20250722073431.21983-7-wangyuli@uniontech.com
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When booting a new kernel with kexec_file, the kernel picks a target
location that the kernel should live at, then allocates random pages,
checks whether any of those patches magically happens to coincide with a
target address range and if so, uses them for that range.
For every page allocated this way, it then creates a page list that the
relocation code - code that executes while all CPUs are off and we are
just about to jump into the new kernel - copies to their final memory
location. We can not put them there before, because chances are pretty
good that at least some page in the target range is already in use by the
currently running Linux environment. Copying is happening from a single
CPU at RAM rate, which takes around 4-50 ms per 100 MiB.
All of this is inefficient and error prone.
To successfully kexec, we need to quiesce all devices of the outgoing
kernel so they don't scribble over the new kernel's memory. We have seen
cases where that does not happen properly (*cough* GIC *cough*) and hence
the new kernel was corrupted. This started a month long journey to root
cause failing kexecs to eventually see memory corruption, because the new
kernel was corrupted severely enough that it could not emit output to tell
us about the fact that it was corrupted. By allocating memory for the
next kernel from a memory range that is guaranteed scribbling free, we can
boot the next kernel up to a point where it is at least able to detect
corruption and maybe even stop it before it becomes severe. This
increases the chance for successful kexecs.
Since kexec got introduced, Linux has gained the CMA framework which can
perform physically contiguous memory mappings, while keeping that memory
available for movable memory when it is not needed for contiguous
allocations. The default CMA allocator is for DMA allocations.
This patch adds logic to the kexec file loader to attempt to place the
target payload at a location allocated from CMA. If successful, it uses
that memory range directly instead of creating copy instructions during
the hot phase. To ensure that there is a safety net in case anything goes
wrong with the CMA allocation, it also adds a flag for user space to force
disable CMA allocations.
Using CMA allocations has two advantages:
1) Faster by 4-50 ms per 100 MiB. There is no more need to copy in the
hot phase.
2) More robust. Even if by accident some page is still in use for DMA,
the new kernel image will be safe from that access because it resides
in a memory region that is considered allocated in the old kernel and
has a chance to reinitialize that component.
Link: https://lkml.kernel.org/r/20250610085327.51817-1-graf@amazon.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Zhongkun He <hezhongkun.hzk@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
xxh32_digest() and xxh32_update() were added in 2017 in the original
xxhash commit, but have remained unused.
Remove them.
Link: https://lkml.kernel.org/r/20250716133245.243363-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Gilbert <linux@treblig.org>
Cc: Nick Terrell <terrelln@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire updates from Takashi Sakamoto:
"This update replaces the remaining tasklet usage in the FireWire
subsystem with workqueue for asynchronous packet transmission. With
this change, tasklets are now fully eliminated from the subsystem.
Asynchronous packet transmission is used for serial bus topology
management as well as for the operation of the SBP-2 protocol driver
(firewire-sbp2). To ensure reliability during low-memory conditions,
the associated workqueue is created with the WQ_MEM_RECLAIM flag,
allowing it to participate in memory reclaim paths. Other attributes
are aligned with those used for isochronous packet handling, which was
migrated to workqueues in v6.12.
The workqueues are sleepable and support preemptible work items,
making them more suitable for real-time workloads that benefit from
timely task preemption at the system level.
There remains an issue where 'schedule()' may be called within an RCU
read-side critical section, due to a direct replacement of
'tasklet_disable_in_atomic()' with 'disable_work_sync()'. A proposed
fix for this has been posted[1], and is currently under review and
testing. It is expected to be sent upstream later"
Link: https://lore.kernel.org/lkml/20250728015125.17825-1-o-takashi@sakamocchi.jp/ [1]
* tag 'firewire-updates-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: reduce the size of common context structure by extracting members into AT structure
firewire: core: minor code refactoring to localize table of gap count
firewire: ohci: use workqueue to handle events of AT request/response contexts
firewire: ohci: use workqueue to handle events of AR request/response contexts
firewire: core: allocate workqueue for AR/AT request/response contexts
firewire: core: use from_work() macro to expand parent structure of work_struct
firewire: ohci: use from_work() macro to expand parent structure of work_struct
firewire: ohci: correct code comments about bus_reset tasklet
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix kCFI failures in JITed BPF code on arm64 (Sami Tolvanen, Puranjay
Mohan, Mark Rutland, Maxwell Bland)
- Disallow tail calls between BPF programs that use different cgroup
local storage maps to prevent out-of-bounds access (Daniel Borkmann)
- Fix unaligned access in flow_dissector and netfilter BPF programs
(Paul Chaignon)
- Avoid possible use of uninitialized mod_len in libbpf (Achill
Gilgenast)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Test for unaligned flow_dissector ctx access
bpf: Improve ctx access verifier error message
bpf: Check netfilter ctx accesses are aligned
bpf: Check flow_dissector ctx accesses are aligned
arm64/cfi,bpf: Support kCFI + BPF on arm64
cfi: Move BPF CFI types and helpers to generic code
cfi: add C CFI type macro
libbpf: Avoid possible use of uninitialized mod_len
bpf: Fix oob access in cgroup local storage
bpf: Move cgroup iterator helpers to bpf.h
bpf: Move bpf map owner out of common struct
bpf: Add cookie object to bpf maps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL updates from Dave Jiang:
"The most significant changes in this pull request is the series that
introduces ACQUIRE() and ACQUIRE_ERR() macros to replace conditional
locking and ease the pain points of scoped_cond_guard().
The series also includes follow on changes that refactor the CXL
sub-system to utilize the new macros.
Detail summary:
- Add documentation template for CXL conventions to document CXL
platform quirks
- Replace mutex_lock_io() with mutex_lock() for mailbox
- Add location limit for fake CFMWS range for cxl_test, ARM platform
enabling
- CXL documentation typo and clarity fixes
- Use correct format specifier for function cxl_set_ecs_threshold()
- Make cxl_bus_type constant
- Introduce new helper cxl_resource_contains_addr() to check address
availability
- Fix wrong DPA checking for PPR operation
- Remove core/acpi.c and CXL core dependency on ACPI
- Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks
- Add CXL updates utilizing ACQUIRE() macro to remove gotos and
improve readability
- Add return for the dummy version of cxl_decoder_detach() without
CONFIG_CXL_REGION
- CXL events updates for spec r3.2
- Fix return of __cxl_decoder_detach() error path
- CXL debugfs documentation fix"
* tag 'cxl-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (28 commits)
Documentation/ABI/testing/debugfs-cxl: Add 'cxl' to clear_poison path
cxl/region: Fix an ERR_PTR() vs NULL bug
cxl/events: Trace Memory Sparing Event Record
cxl/events: Add extra validity checks for CVME count in DRAM Event Record
cxl/events: Add extra validity checks for corrected memory error count in General Media Event Record
cxl/events: Update Common Event Record to CXL spec rev 3.2
cxl: Fix -Werror=return-type in cxl_decoder_detach()
cleanup: Fix documentation build error for ACQUIRE updates
cxl: Convert to ACQUIRE() for conditional rwsem locking
cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach()
cxl/region: Move ready-to-probe state check to a helper
cxl/region: Split commit_store() into __commit() and queue_reset() helpers
cxl/decoder: Drop pointless locking
cxl/decoder: Move decoder register programming to a helper
cxl/mbox: Convert poison list mutex to ACQUIRE()
cleanup: Introduce ACQUIRE() and ACQUIRE_ERR() for conditional locks
cxl: Remove core/acpi.c and cxl core dependency on ACPI
cxl/core: Using cxl_resource_contains_addr() to check address availability
cxl/edac: Fix wrong dpa checking for PPR operation
cxl/core: Introduce a new helper cxl_resource_contains_addr()
...
|
|
In ip_output() skb->dev is updated from the skb_dst(skb)->dev
this can become invalid when the interface is unregistered and freed,
Introduced new skb_dst_dev_rcu() function to be used instead of
skb_dst_dev() within rcu_locks in ip_output.This will ensure that
all the skb's associated with the dev being deregistered will
be transnmitted out first, before freeing the dev.
Given that ip_output() is called within an rcu_read_lock()
critical section or from a bottom-half context, it is safe to introduce
an RCU read-side critical section within it.
Multiple panic call stacks were observed when UL traffic was run
in concurrency with device deregistration from different functions,
pasting one sample for reference.
[496733.627565][T13385] Call trace:
[496733.627570][T13385] bpf_prog_ce7c9180c3b128ea_cgroupskb_egres+0x24c/0x7f0
[496733.627581][T13385] __cgroup_bpf_run_filter_skb+0x128/0x498
[496733.627595][T13385] ip_finish_output+0xa4/0xf4
[496733.627605][T13385] ip_output+0x100/0x1a0
[496733.627613][T13385] ip_send_skb+0x68/0x100
[496733.627618][T13385] udp_send_skb+0x1c4/0x384
[496733.627625][T13385] udp_sendmsg+0x7b0/0x898
[496733.627631][T13385] inet_sendmsg+0x5c/0x7c
[496733.627639][T13385] __sys_sendto+0x174/0x1e4
[496733.627647][T13385] __arm64_sys_sendto+0x28/0x3c
[496733.627653][T13385] invoke_syscall+0x58/0x11c
[496733.627662][T13385] el0_svc_common+0x88/0xf4
[496733.627669][T13385] do_el0_svc+0x2c/0xb0
[496733.627676][T13385] el0_svc+0x2c/0xa4
[496733.627683][T13385] el0t_64_sync_handler+0x68/0xb4
[496733.627689][T13385] el0t_64_sync+0x1a4/0x1a8
Changes in v3:
- Replaced WARN_ON() with WARN_ON_ONCE(), as suggested by Willem de Bruijn.
- Dropped legacy lines mistakenly pulled in from an outdated branch.
Changes in v2:
- Addressed review comments from Eric Dumazet
- Used READ_ONCE() to prevent potential load/store tearing
- Added skb_dst_dev_rcu() and used along with rcu_read_lock() in ip_output
Signed-off-by: Sharath Chandra Vurukala <quic_sharathv@quicinc.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250730105118.GA26100@hu-sharathv-hyd.qualcomm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When sending a packet with virtio_net_hdr to tun device, if the gso_type
in virtio_net_hdr is SKB_GSO_UDP and the gso_size is less than udphdr
size, below crash may happen.
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:4572!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 62 Comm: mytest Not tainted 6.16.0-rc7 #203 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:skb_pull_rcsum+0x8e/0xa0
Code: 00 00 5b c3 cc cc cc cc 8b 93 88 00 00 00 f7 da e8 37 44 38 00 f7 d8 89 83 88 00 00 00 48 8b 83 c8 00 00 00 5b c3 cc cc cc cc <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 000
RSP: 0018:ffffc900001fba38 EFLAGS: 00000297
RAX: 0000000000000004 RBX: ffff8880040c1000 RCX: ffffc900001fb948
RDX: ffff888003e6d700 RSI: 0000000000000008 RDI: ffff88800411a062
RBP: ffff8880040c1000 R08: 0000000000000000 R09: 0000000000000001
R10: ffff888003606c00 R11: 0000000000000001 R12: 0000000000000000
R13: ffff888004060900 R14: ffff888004050000 R15: ffff888004060900
FS: 000000002406d3c0(0000) GS:ffff888084a19000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000040 CR3: 0000000004007000 CR4: 00000000000006f0
Call Trace:
<TASK>
udp_queue_rcv_one_skb+0x176/0x4b0 net/ipv4/udp.c:2445
udp_queue_rcv_skb+0x155/0x1f0 net/ipv4/udp.c:2475
udp_unicast_rcv_skb+0x71/0x90 net/ipv4/udp.c:2626
__udp4_lib_rcv+0x433/0xb00 net/ipv4/udp.c:2690
ip_protocol_deliver_rcu+0xa6/0x160 net/ipv4/ip_input.c:205
ip_local_deliver_finish+0x72/0x90 net/ipv4/ip_input.c:233
ip_sublist_rcv_finish+0x5f/0x70 net/ipv4/ip_input.c:579
ip_sublist_rcv+0x122/0x1b0 net/ipv4/ip_input.c:636
ip_list_rcv+0xf7/0x130 net/ipv4/ip_input.c:670
__netif_receive_skb_list_core+0x21d/0x240 net/core/dev.c:6067
netif_receive_skb_list_internal+0x186/0x2b0 net/core/dev.c:6210
napi_complete_done+0x78/0x180 net/core/dev.c:6580
tun_get_user+0xa63/0x1120 drivers/net/tun.c:1909
tun_chr_write_iter+0x65/0xb0 drivers/net/tun.c:1984
vfs_write+0x300/0x420 fs/read_write.c:593
ksys_write+0x60/0xd0 fs/read_write.c:686
do_syscall_64+0x50/0x1c0 arch/x86/entry/syscall_64.c:63
</TASK>
To trigger gso segment in udp_queue_rcv_skb(), we should also set option
UDP_ENCAP_ESPINUDP to enable udp_sk(sk)->encap_rcv. When the encap_rcv
hook return 1 in udp_queue_rcv_one_skb(), udp_csum_pull_header() will try
to pull udphdr, but the skb size has been segmented to gso size, which
leads to this crash.
Previous commit cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
introduces segmentation in UDP receive path only for GRO, which was never
intended to be used for UFO, so drop UFO packets in udp_rcv_segment().
Link: https://lore.kernel.org/netdev/20250724083005.3918375-1-wangliang74@huawei.com/
Link: https://lore.kernel.org/netdev/20250729123907.3318425-1-wangliang74@huawei.com/
Fixes: cf329aa42b66 ("udp: cope with UDP GRO packet misdirection")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Signed-off-by: Wang Liang <wangliang74@huawei.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250730101458.3470788-1-wangliang74@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot was able to craft a packet with very long IPv6 extension headers
leading to an overflow of skb->transport_header.
This 16bit field has a limited range.
Add skb_reset_transport_header_careful() helper and use it
from ipv6_gso_segment()
WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 skb_reset_transport_header include/linux/skbuff.h:3032 [inline]
WARNING: CPU: 0 PID: 5871 at ./include/linux/skbuff.h:3032 ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151
Modules linked in:
CPU: 0 UID: 0 PID: 5871 Comm: syz-executor211 Not tainted 6.16.0-rc6-syzkaller-g7abc678e3084 #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
RIP: 0010:skb_reset_transport_header include/linux/skbuff.h:3032 [inline]
RIP: 0010:ipv6_gso_segment+0x15e2/0x21e0 net/ipv6/ip6_offload.c:151
Call Trace:
<TASK>
skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53
nsh_gso_segment+0x54a/0xe10 net/nsh/nsh.c:110
skb_mac_gso_segment+0x31c/0x640 net/core/gso.c:53
__skb_gso_segment+0x342/0x510 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x857/0x11b0 net/core/dev.c:3950
validate_xmit_skb_list+0x84/0x120 net/core/dev.c:4000
sch_direct_xmit+0xd3/0x4b0 net/sched/sch_generic.c:329
__dev_xmit_skb net/core/dev.c:4102 [inline]
__dev_queue_xmit+0x17b6/0x3a70 net/core/dev.c:4679
Fixes: d1da932ed4ec ("ipv6: Separate ipv6 offload support")
Reported-by: syzbot+af43e647fd835acc02df@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/688a1a05.050a0220.5d226.0008.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250730131738.3385939-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|