Age | Commit message (Collapse) | Author |
|
If we finds a vq without a name in our input array in
virtio_ccw_find_vqs(), we treat it as "non-existing" and set the vq pointer
to NULL; we will not call virtio_ccw_setup_vq() to allocate/setup a vq.
Consequently, we create only a queue if it actually exists (name != NULL)
and assign an incremental queue index to each such existing queue.
However, in virtio_ccw_register_adapter_ind()->get_airq_indicator() we
will not ignore these "non-existing queues", but instead assign an airq
indicator to them.
Besides never releasing them in virtio_ccw_drop_indicators() (because
there is no virtqueue), the bigger issue seems to be that there will be a
disagreement between the device and the Linux guest about the airq
indicator to be used for notifying a queue, because the indicator bit
for adapter I/O interrupt is derived from the queue index.
The virtio spec states under "Setting Up Two-Stage Queue Indicators":
... indicator contains the guest address of an area wherein the
indicators for the devices are contained, starting at bit_nr, one
bit per virtqueue of the device.
And further in "Notification via Adapter I/O Interrupts":
For notifying the driver of virtqueue buffers, the device sets the
bit in the guest-provided indicator area at the corresponding
offset.
For example, QEMU uses in virtio_ccw_notify() the queue index (passed as
"vector") to select the relevant indicator bit. If a queue does not exist,
it does not have a corresponding indicator bit assigned, because it
effectively doesn't have a queue index.
Using a virtio-balloon-ccw device under QEMU with free-page-hinting
disabled ("free-page-hint=off") but free-page-reporting enabled
("free-page-reporting=on") will result in free page reporting
not working as expected: in the virtio_balloon driver, we'll be stuck
forever in virtballoon_free_page_report()->wait_event(), because the
waitqueue will not be woken up as the notification from the device is
lost: it would use the wrong indicator bit.
Free page reporting stops working and we get splats (when configured to
detect hung wqs) like:
INFO: task kworker/1:3:463 blocked for more than 61 seconds.
Not tainted 6.14.0 #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/1:3 [...]
Workqueue: events page_reporting_process
Call Trace:
[<000002f404e6dfb2>] __schedule+0x402/0x1640
[<000002f404e6f22e>] schedule+0x3e/0xe0
[<000002f3846a88fa>] virtballoon_free_page_report+0xaa/0x110 [virtio_balloon]
[<000002f40435c8a4>] page_reporting_process+0x2e4/0x740
[<000002f403fd3ee2>] process_one_work+0x1c2/0x400
[<000002f403fd4b96>] worker_thread+0x296/0x420
[<000002f403fe10b4>] kthread+0x124/0x290
[<000002f403f4e0dc>] __ret_from_fork+0x3c/0x60
[<000002f404e77272>] ret_from_fork+0xa/0x38
There was recently a discussion [1] whether the "holes" should be
treated differently again, effectively assigning also non-existing
queues a queue index: that should also fix the issue, but requires other
workarounds to not break existing setups.
Let's fix it without affecting existing setups for now by properly ignoring
the non-existing queues, so the indicator bits will match the queue
indexes.
[1] https://lore.kernel.org/all/cover.1720611677.git.mst@redhat.com/
Fixes: a229989d975e ("virtio: don't allocate vqs when names[i] = NULL")
Reported-by: Chandra Merla <cmerla@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <david@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20250402203621.940090-1-david@redhat.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix machine check handler _CIF_MCCK_GUEST bit setting by adding the
missing base register for relocated lowcore address
- Fix build failure on older linkers by conditionally adding the
-no-pie linker option only when it is supported
- Fix inaccurate kernel messages in vfio-ap by providing descriptive
error notifications for AP queue sharing violations
- Fix PCI isolation logic by ensuring non-VF devices correctly return
false in zpci_bus_is_isolated_vf()
- Fix PCI DMA range map setup by using dma_direct_set_offset() to add a
proper sentinel element, preventing potential overruns and
translation errors
- Cleanup header dependency problems with asm-offsets.c
- Add fault info for unexpected low-address protection faults in user
mode
- Add support for HOTPLUG_SMT, replacing the arch-specific "nosmt"
handling with common code handling
- Use bitop functions to implement CPU flag helper functions to ensure
that bits cannot get lost if modified in different contexts on a CPU
- Remove unused machine_flags for the lowcore
* tag 's390-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Fix no AP queue sharing allowed message written to kernel log
s390/pci: Fix dev.dma_range_map missing sentinel element
s390/mm: Dump fault info in case of low address protection fault
s390/smp: Add support for HOTPLUG_SMT
s390: Fix linker error when -no-pie option is unavailable
s390/processor: Use bitop functions for cpu flag helper functions
s390/asm-offsets: Remove ASM_OFFSETS_C
s390/asm-offsets: Include ftrace_regs.h instead of ftrace.h
s390/kvm: Split kvm_host header file
s390/pci: Fix zpci_bus_is_isolated_vf() for non-VFs
s390/lowcore: Remove unused machine_flags
s390/entry: Fix setting _CIF_MCCK_GUEST with lowcore relocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "Enable strict percpu address space checks" from Uros
Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was found to be incorrect.
- The series "Some cleanup for memcg" from Chen Ridong implements some
relatively monir cleanups for the memcontrol code.
- The series "mm: fixes for device-exclusive entries (hmm)" from David
Hildenbrand fixes a boatload of issues which David found then using
device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now
succeed.
- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
remove the z3fold and zbud implementations. They have been deprecated
for half a year and nobody has complained.
- The series "mm: further simplify VMA merge operation" from Lorenzo
Stoakes implements numerous simplifications in this area. No runtime
effects are anticipated.
- The series "mm/madvise: remove redundant mmap_lock operations from
process_madvise()" from SeongJae Park rationalizes the locking in the
madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The series "Tiny cleanup and improvements about SWAP code" from
Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak
user-visible output.
- The series "mm/damon/paddr: fix large folios access and schemes
handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The series "mm/damon/core: fix wrong and/or useless damos_walk()
behaviors" from SeongJae Park fixes a few issues with the accuracy of
kdamond's walking of DAMON regions.
- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
Stoakes changes the interaction between framebuffer deferred-io and
core MM. No functional changes are anticipated - this is preparatory
work for the future removal of page structure fields.
- The series "mm/damon: add support for hugepage_size DAMOS filter"
from Usama Arif adds a DAMOS filter which permits the filtering by
huge page sizes.
- The series "mm: permit guard regions for file-backed/shmem mappings"
from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping
for pte-mapped large folios.
- The series "reimplement per-vma lock as a refcount" from Suren
Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
improves" from SeongJae Park does some maintenance work on the DAMON
docs.
- The series "hugetlb/CMA improvements for large systems" from Frank
van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The series "selftests/mm: Some cleanups from trying to run them" from
Brendan Jackman fixes a pile of unrelated issues which Brendan
encountered while runnimg our selftests.
- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply
wasn't being effective.
- The series "mm: cleanups for device-exclusive entries (hmm)" from
David Hildenbrand implements a number of unrelated cleanups in this
code.
- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
implements a number of preparatoty cleanups to the GENERIC_PTDUMP
Kconfig logic.
- The series "mm/damon: auto-tune aggregation interval" from SeongJae
Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the
code easier to follow.
- The series "page_counter cleanup and size reduction" from Shakeel
Butt cleans up the page_counter code and fixes a size increase which
we accidentally added late last year.
- The series "Add a command line option that enables control of how
many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The series "mm/damon: make allow filters after reject filters useful
and intuitive" from SeongJae Park improves the handling of allow and
reject filters. Behaviour is made more consistent and the documention
is updated accordingly.
- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
updates zswap to the new object read/write APIs and thus permits the
removal of some legacy code from zpool and zsmalloc.
- The series "Some trivial cleanups for shmem" from Baolin Wang does as
it claims.
- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
preparatoty refactoring and cleanup of the mremap() code.
- The series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
on handling layers" from SeongJae Park adds a couple of new sysfs
directories to ease the management of DAMON/DAMOS filters.
- The series "arch, mm: reduce code duplication in mem_init()" from
Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The series "mm: page_ext: Introduce new iteration API" from Luiz
Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The series "Buddy allocator like (or non-uniform) folio split" from
Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios
are generated.
- The series "Minimize xa_node allocation during xarry split" from Zi
Yan reduces the number of xarray xa_nodes which are generated during
an xarray split.
- The series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The series "Add tracepoints for lowmem reserves, watermarks and
totalreserve_pages" from Martin Liu adds some more tracepoints to the
page allocator code.
- The series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which
SeongJae observed during his earlier madvise work.
- The series "mm/hwpoison: Fix regressions in memory failure handling"
from Shuai Xue addresses two quite serious regressions which Shuai
has observed in the memory-failure implementation.
- The series "mm: reliable huge page allocator" from Johannes Weiner
makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The series "Minor memcg cleanups & prep for memdescs" from Matthew
Wilcox is preparatory work for the future implementation of memdescs.
- The series "track memory used by balloon drivers" from Nico Pache
introduces a way to track memory used by our various balloon drivers.
- The series "mm/damon: introduce DAMOS filter type for active pages"
from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
separates the proactive reclaim statistics from the direct reclaim
statistics.
- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
code.
* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
x86/mm: restore early initialization of high_memory for 32-bits
mm/vmscan: don't try to reclaim hwpoison folio
mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
selftests/mm: speed up split_huge_page_test
selftests/mm: uffd-unit-tests support for hugepages > 2M
docs/mm/damon/design: document active DAMOS filter type
mm/damon: implement a new DAMOS filter type for active pages
fs/dax: don't disassociate zero page entries
MM documentation: add "Unaccepted" meminfo entry
selftests/mm: add commentary about 9pfs bugs
fork: use __vmalloc_node() for stack allocation
docs/mm: Physical Memory: Populate the "Zones" section
xen: balloon: update the NR_BALLOON_PAGES state
hv_balloon: update the NR_BALLOON_PAGES state
balloon_compaction: update the NR_BALLOON_PAGES state
meminfo: add a per node counter for balloon drivers
mm: remove references to folio in __memcg_kmem_uncharge_page()
...
|
|
An erroneous message is written to the kernel log when either of the
following actions are taken by a user:
1. Assign an adapter or domain to a vfio_ap mediated device via its sysfs
assign_adapter or assign_domain attributes that would result in one or
more AP queues being assigned that are already assigned to a different
mediated device. Sharing of queues between mdevs is not allowed.
2. Reserve an adapter or domain for the host device driver via the AP bus
driver's sysfs apmask or aqmask attribute that would result in providing
host access to an AP queue that is in use by a vfio_ap mediated device.
Reserving a queue for a host driver that is in use by an mdev is not
allowed.
In both cases, the assignment will return an error; however, a message like
the following is written to the kernel log:
vfio_ap_mdev e1839397-51a0-4e3c-91e0-c3b9c3d3047d: Userspace may not
re-assign queue 00.0028 already assigned to \
e1839397-51a0-4e3c-91e0-c3b9c3d3047d
Notice the mdev reporting the error is the same as the mdev identified
in the message as the one to which the queue is being assigned.
It is perfectly okay to assign a queue to an mdev to which it is
already assigned; the assignment is simply ignored by the vfio_ap device
driver.
This patch logs more descriptive and accurate messages for both 1 and 2
above to the kernel log:
Example for 1:
vfio_ap_mdev 0fe903a0-a323-44db-9daf-134c68627d61: Userspace may not assign
queue 00.0033 to mdev: already assigned to \
62177883-f1bb-47f0-914d-32a22e3a8804
Example for 2:
vfio_ap_mdev 62177883-f1bb-47f0-914d-32a22e3a8804: Can not reserve queue
00.0033 for host driver: in use by mdev
Signed-off-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250311103304.1539188-1-akrowiak@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add sorting of mcount locations at build time
- Rework uaccess functions with C exception handling to shorten inline
assembly size and enable full inlining. This yields near-optimal code
for small constant copies with a ~40kb kernel size increase
- Add support for a configurable STRICT_MM_TYPECHECKS which allows to
generate better code, but also allows to have type checking for debug
builds
- Optimize get_lowcore() for common callers with alternatives that
nearly revert to the pre-relocated lowcore code, while also slightly
reducing syscall entry and exit time
- Convert MACHINE_HAS_* checks for single facility tests into cpu_has_*
style macros that call test_facility(), and for features with
additional conditions, add a new ALT_TYPE_FEATURE alternative to
provide a static branch via alternative patching. Also, move machine
feature detection to the decompressor for early patching and add
debugging functionality to easily show which alternatives are patched
- Add exception table support to early boot / startup code to get rid
of the open coded exception handling
- Use asm_inline for all inline assemblies with EX_TABLE or ALTERNATIVE
to ensure correct inlining and unrolling decisions
- Remove 2k page table leftovers now that s390 has been switched to
always allocate 4k page tables
- Split kfence pool into 4k mappings in arch_kfence_init_pool() and
remove the architecture-specific kfence_split_mapping()
- Use READ_ONCE_NOCHECK() in regs_get_kernel_stack_nth() to silence
spurious KASAN warnings from opportunistic ftrace argument tracing
- Force __atomic_add_const() variants on s390 to always return void,
ensuring compile errors for improper usage
- Remove s390's ioremap_wt() and pgprot_writethrough() due to
mismatched semantics and lack of known users, relying on asm-generic
fallbacks
- Signal eventfd in vfio-ap to notify userspace when the guest AP
configuration changes, including during mdev removal
- Convert mdev_types from an array to a pointer in vfio-ccw and vfio-ap
drivers to avoid fake flex array confusion
- Cleanup trap code
- Remove references to the outdated linux390@de.ibm.com address
- Other various small fixes and improvements all over the code
* tag 's390-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (78 commits)
s390: Use inline qualifier for all EX_TABLE and ALTERNATIVE inline assemblies
s390/kfence: Split kfence pool into 4k mappings in arch_kfence_init_pool()
s390/ptrace: Avoid KASAN false positives in regs_get_kernel_stack_nth()
s390/boot: Ignore vmlinux.map
s390/sysctl: Remove "vm/allocate_pgste" sysctl
s390: Remove 2k vs 4k page table leftovers
s390/tlb: Use mm_has_pgste() instead of mm_alloc_pgste()
s390/lowcore: Use lghi instead llilh to clear register
s390/syscall: Merge __do_syscall() and do_syscall()
s390/spinlock: Implement SPINLOCK_LOCKVAL with inline assembly
s390/smp: Implement raw_smp_processor_id() with inline assembly
s390/current: Implement current with inline assembly
s390/lowcore: Use inline qualifier for get_lowcore() inline assembly
s390: Move s390 sysctls into their own file under arch/s390
s390/syscall: Simplify syscall_get_arguments()
s390/vfio-ap: Notify userspace that guest's AP config changed when mdev removed
s390: Remove ioremap_wt() and pgprot_writethrough()
s390/mm: Add configurable STRICT_MM_TYPECHECKS
s390/mm: Convert pgste_val() into function
s390/mm: Convert pgprot_val() into function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Enable Configuration RRS SV, which makes device readiness visible,
early instead of during child bus scanning (Bjorn Helgaas)
- Log debug messages about reset methods being used (Bjorn Helgaas)
- Avoid reset when it has been disabled via sysfs (Nishanth
Aravamudan)
- Add common pci-ep-bus.yaml schema for exporting several peripherals
of a single PCI function via devicetree (Andrea della Porta)
- Create DT nodes for PCI host bridges to enable loading device tree
overlays to create platform devices for PCI devices that have
several features that require multiple drivers (Herve Codina)
Resource management:
- Enlarge devres table[] to accommodate bridge windows, ROM, IOV
BARs, etc., and validate BAR index in devres interfaces (Philipp
Stanner)
- Fix typo that repeatedly distributed resources to a bridge instead
of iterating over subordinate bridges, which resulted in too little
space to assign some BARs (Kai-Heng Feng)
- Relax bridge window tail sizing for optional resources, e.g., IOV
BARs, to avoid failures when removing and re-adding devices (Ilpo
Järvinen)
- Allow drivers to enable devices even if we haven't assigned
optional IOV resources to them (Ilpo Järvinen)
- Rework handling of optional resources (IOV BARs, ROMs) to reduce
failures if we can't allocate them (Ilpo Järvinen)
- Fix a NULL dereference in the SR-IOV VF creation error path (Shay
Drory)
- Fix s390 mmio_read/write syscalls, which didn't cause page faults
in some cases, which broke vfio-pci lazy mapping on first access
(Niklas Schnelle)
- Add pdev->non_mappable_bars to replace CONFIG_VFIO_PCI_MMAP, which
was disabled only for s390 (Niklas Schnelle)
- Support mmap of PCI resources on s390 except for ISM devices
(Niklas Schnelle)
ASPM:
- Delay pcie_link_state deallocation to avoid dangling pointers that
cause invalid references during hot-unplug (Daniel Stodden)
Power management:
- Allow PCI bridges to go to D3Hot when suspending on all non-x86
systems (Manivannan Sadhasivam)
Power control:
- Create pwrctrl devices in pci_scan_device() to make it more
symmetric with pci_pwrctrl_unregister() and make pwrctrl devices
for PCI bridges possible (Manivannan Sadhasivam)
- Unregister pwrctrl devices in pci_destroy_dev() so DOE, ASPM, etc.
can still access devices after pci_stop_dev() (Manivannan
Sadhasivam)
- If there's a pwrctrl device for a PCI device, skip scanning it
because the pwrctrl core will rescan the bus after the device is
powered on (Manivannan Sadhasivam)
- Add a pwrctrl driver for PCI slots based on voltage regulators
described via devicetree (Manivannan Sadhasivam)
Bandwidth control:
- Add set_pcie_speed.sh to TEST_PROGS to fix issue when executing the
set_pcie_cooling_state.sh test case (Yi Lai)
- Avoid a NULL pointer dereference when we run out of bus numbers to
assign for a bridge secondary bus (Lukas Wunner)
Hotplug:
- Drop superfluous pci_hotplug_slot_list, try_module_get() calls, and
NULL pointer checks (Lukas Wunner)
- Drop shpchp module init/exit logging, replace shpchp dbg() with
ctrl_dbg(), and remove unused dbg(), err(), info(), warn() wrappers
(Ilpo Järvinen)
- Drop 'shpchp_debug' module parameter in favor of standard dynamic
debugging (Ilpo Järvinen)
- Drop unused cpcihp .get_power(), .set_power() function pointers
(Guilherme Giacomo Simoes)
- Disable hotplug interrupts in portdrv only when pciehp is not
enabled to avoid issuing two hotplug commands too close together
(Feng Tang)
- Skip pciehp 'device replaced' check if the device has been removed
to address a deadlock when resuming after a device was removed
during system sleep (Lukas Wunner)
- Don't enable pciehp hotplug interupt when resuming in poll mode
(Ilpo Järvinen)
Virtualization:
- Fix bugs in 'pci=config_acs=' kernel command line parameter (Tushar
Dave)
DOE:
- Expose supported DOE features via sysfs (Alistair Francis)
- Allow DOE support to be enabled even if CXL isn't enabled (Alistair
Francis)
Endpoint framework:
- Convert PCI device data so pci-epf-test works correctly on
big-endian endpoint systems (Niklas Cassel)
- Add BAR_RESIZABLE type to endpoint framework and add DWC core
support for EPF drivers to set BAR_RESIZABLE type and size (Niklas
Cassel)
- Fix pci-epf-test double free that causes an oops if the host
reboots and PERST# deassertion restarts endpoint BAR allocation
(Christian Bruel)
- Fix endpoint BAR testing so tests can skip disabled BARs instead of
reporting them as failures (Niklas Cassel)
- Widen endpoint test BAR size variable to accommodate BARs larger
than INT_MAX (Niklas Cassel)
- Remove unused tools 'pci' build target left over after moving tests
to tools/testing/selftests/pci_endpoint (Jianfeng Liu)
Altera PCIe controller driver:
- Add DT binding and driver support for Agilex family (P-Tile,
F-Tile, R-Tile) (Matthew Gerlach and D M, Sharath Kumar)
AMD MDB PCIe controller driver:
- Add DT binding and driver for AMD MDB (Multimedia DMA Bridge)
(Thippeswamy Havalige)
Broadcom STB PCIe controller driver:
- Add BCM2712 MSI-X DT binding and interrupt controller drivers and
add softdep on irq_bcm2712_mip driver to ensure that it is loaded
first (Stanimir Varbanov)
- Expand inbound window map to 64GB so it can accommodate BCM2712
(Stanimir Varbanov)
- Add BCM2712 support and DT updates (Stanimir Varbanov)
- Apply link speed restriction before bringing link up, not after
(Jim Quinlan)
- Update Max Link Speed in Link Capabilities via the internal
writable register, not the read-only config register (Jim Quinlan)
- Handle regulator_bulk_get() error to avoid panic when we call
regulator_bulk_free() later (Jim Quinlan)
- Disable regulators only when removing the bus immediately below a
Root Port because we don't support regulators deeper in the
hierarchy (Jim Quinlan)
- Make const read-only arrays static (Colin Ian King)
Cadence PCIe endpoint driver:
- Correct MSG TLP generation so endpoints can generate INTx messages
(Hans Zhang)
Freescale i.MX6 PCIe controller driver:
- Identify the second controller on i.MX8MQ based on devicetree
'linux,pci-domain' instead of DBI 'reg' address (Richard Zhu)
- Remove imx_pcie_cpu_addr_fixup() since dwc core can now derive the
ATU input address (using parent_bus_offset) from devicetree (Frank
Li)
Freescale Layerscape PCIe controller driver:
- Drop deprecated 'num-ib-windows' and 'num-ob-windows' and
unnecessary 'status' from example (Krzysztof Kozlowski)
- Correct the syscon_regmap_lookup_by_phandle_args("fsl,pcie-scfg")
arg_count to fix probe failure on LS1043A (Ioana Ciornei)
HiSilicon STB PCIe controller driver:
- Call phy_exit() to clean up if histb_pcie_probe() fails (Christophe
JAILLET)
Intel Gateway PCIe controller driver:
- Remove intel_pcie_cpu_addr() since dwc core can now derive the ATU
input address (using parent_bus_offset) from devicetree (Frank Li)
Intel VMD host bridge driver:
- Convert vmd_dev.cfg_lock from spinlock_t to raw_spinlock_t so
pci_ops.read() will never sleep, even on PREEMPT_RT where
spinlock_t becomes a sleepable lock, to avoid calling a sleeping
function from invalid context (Ryo Takakura)
MediaTek PCIe Gen3 controller driver:
- Remove leftover mac_reset assert for Airoha EN7581 SoC (Lorenzo
Bianconi)
- Add EN7581 PBUS controller 'mediatek,pbus-csr' DT property and
program host bridge memory aperture to this syscon node (Lorenzo
Bianconi)
Qualcomm PCIe controller driver:
- Add qcom,pcie-ipq5332 binding (Varadarajan Narayanan)
- Add qcom i.MX8QM and i.MX8QXP/DXP optional DMA interrupt (Alexander
Stein)
- Add optional dma-coherent DT property for Qualcomm SA8775P (Dmitry
Baryshkov)
- Make DT iommu property required for SA8775P and prohibited for
SDX55 (Dmitry Baryshkov)
- Add DT IOMMU and DMA-related properties for Qualcomm SM8450 (Dmitry
Baryshkov)
- Add endpoint DT properties for SAR2130P and enable endpoint mode in
driver (Dmitry Baryshkov)
- Describe endpoint BAR0 and BAR2 as 64-bit only and BAR1 and BAR3 as
RESERVED (Manivannan Sadhasivam)
Rockchip DesignWare PCIe controller driver:
- Describe rk3568 and rk3588 BARs as Resizable, not Fixed (Niklas
Cassel)
Synopsys DesignWare PCIe controller driver:
- Add debugfs-based Silicon Debug, Error Injection, Statistical
Counter support for DWC (Shradha Todi)
- Add debugfs property to expose LTSSM status of DWC PCIe link (Hans
Zhang)
- Add Rockchip support for DWC debugfs features (Niklas Cassel)
- Add dw_pcie_parent_bus_offset() to look up the parent bus address
of a specified 'reg' property and return the offset from the CPU
physical address (Frank Li)
- Use dw_pcie_parent_bus_offset() to derive CPU -> ATU addr offset
via 'reg[config]' for host controllers and 'reg[addr_space]' for
endpoint controllers (Frank Li)
- Apply struct dw_pcie.parent_bus_offset in ATU users to remove use
of .cpu_addr_fixup() when programming ATU (Frank Li)
TI J721E PCIe driver:
- Correct the 'link down' interrupt bit for J784S4 (Siddharth
Vadapalli)
TI Keystone PCIe controller driver:
- Describe AM65x BARs 2 and 5 as Resizable (not Fixed) and reduce
alignment requirement from 1MB to 64KB (Niklas Cassel)
Xilinx Versal CPM PCIe controller driver:
- Free IRQ domain in probe error path to avoid leaking it
(Thippeswamy Havalige)
- Add DT .compatible "xlnx,versal-cpm5nc-host" and driver support for
Versal Net CPM5NC Root Port controller (Thippeswamy Havalige)
- Add driver support for CPM5_HOST1 (Thippeswamy Havalige)
Miscellaneous:
- Convert fsl,mpc83xx-pcie binding to YAML (J. Neuschäfer)
- Use for_each_available_child_of_node_scoped() to simplify apple,
kirin, mediatek, mt7621, tegra drivers (Zhang Zekun)"
* tag 'pci-v6.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (197 commits)
PCI: layerscape: Fix arg_count to syscon_regmap_lookup_by_phandle_args()
PCI: j721e: Fix the value of .linkdown_irq_regfield for J784S4
misc: pci_endpoint_test: Add support for PCITEST_IRQ_TYPE_AUTO
PCI: endpoint: pci-epf-test: Expose supported IRQ types in CAPS register
PCI: dw-rockchip: Endpoint mode cannot raise INTx interrupts
PCI: endpoint: Add intx_capable to epc_features struct
dt-bindings: PCI: Add common schema for devices accessible through PCI BARs
PCI: intel-gw: Remove intel_pcie_cpu_addr()
PCI: imx6: Remove imx_pcie_cpu_addr_fixup()
PCI: dwc: Use parent_bus_offset to remove need for .cpu_addr_fixup()
PCI: dwc: ep: Ensure proper iteration over outbound map windows
PCI: dwc: ep: Use devicetree 'reg[addr_space]' to derive CPU -> ATU addr offset
PCI: dwc: ep: Consolidate devicetree handling in dw_pcie_ep_get_resources()
PCI: dwc: ep: Call epc_create() early in dw_pcie_ep_init()
PCI: dwc: Use devicetree 'reg[config]' to derive CPU -> ATU addr offset
PCI: dwc: Add dw_pcie_parent_bus_offset() checking and debug
PCI: dwc: Add dw_pcie_parent_bus_offset()
PCI/bwctrl: Fix NULL pointer dereference on bus number exhaustion
PCI: xilinx-cpm: Add cpm_csr register mapping for CPM5_HOST1 variant
PCI: brcmstb: Make const read-only arrays static
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Continue Netlink conversions to per-namespace RTNL lock
(IPv4 routing, routing rules, routing next hops, ARP ioctls)
- Continue extending the use of netdev instance locks. As a driver
opt-in protect queue operations and (in due course) ethtool
operations with the instance lock and not RTNL lock.
- Support collecting TCP timestamps (data submitted, sent, acked) in
BPF, allowing for transparent (to the application) and lower
overhead tracking of TCP RPC performance.
- Tweak existing networking Rx zero-copy infra to support zero-copy
Rx via io_uring.
- Optimize MPTCP performance in single subflow mode by 29%.
- Enable GRO on packets which went thru XDP CPU redirect (were queued
for processing on a different CPU). Improving TCP stream
performance up to 2x.
- Improve performance of contended connect() by 200% by searching for
an available 4-tuple under RCU rather than a spin lock. Bring an
additional 229% improvement by tweaking hash distribution.
- Avoid unconditionally touching sk_tsflags on RX, improving
performance under UDP flood by as much as 10%.
- Avoid skb_clone() dance in ping_rcv() to improve performance under
ping flood.
- Avoid FIB lookup in netfilter if socket is available, 20% perf win.
- Rework network device creation (in-kernel) API to more clearly
identify network namespaces and their roles. There are up to 4
namespace roles but we used to have just 2 netns pointer arguments,
interpreted differently based on context.
- Use sysfs_break_active_protection() instead of trylock to avoid
deadlocks between unregistering objects and sysfs access.
- Add a new sysctl and sockopt for capping max retransmit timeout in
TCP.
- Support masking port and DSCP in routing rule matches.
- Support dumping IPv4 multicast addresses with RTM_GETMULTICAST.
- Support specifying at what time packet should be sent on AF_XDP
sockets.
- Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin
users.
- Add Netlink YAML spec for WiFi (nl80211) and conntrack.
- Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols
which only need to be exported when IPv6 support is built as a
module.
- Age FDB entries based on Rx not Tx traffic in VxLAN, similar to
normal bridging.
- Allow users to specify source port range for GENEVE tunnels.
- netconsole: allow attaching kernel release, CPU ID and task name to
messages as metadata
Driver API:
- Continue rework / fixing of Energy Efficient Ethernet (EEE) across
the SW layers. Delegate the responsibilities to phylink where
possible. Improve its handling in phylib.
- Support symmetric OR-XOR RSS hashing algorithm.
- Support tracking and preserving IRQ affinity by NAPI itself.
- Support loopback mode speed selection for interface selftests.
Device drivers:
- Remove the IBM LCS driver for s390
- Remove the sb1000 cable modem driver
- Add support for SFP module access over SMBus
- Add MCTP transport driver for MCTP-over-USB
- Enable XDP metadata support in multiple drivers
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- add PCIe TLP Processing Hints (TPH) support for new AMD
platforms
- support dumping RoCE queue state for debug
- opt into instance locking
- Intel (100G, ice, idpf):
- ice: rework MSI-X IRQ management and distribution
- ice: support for E830 devices
- iavf: add support for Rx timestamping
- iavf: opt into instance locking
- nVidia/Mellanox:
- mlx4: use page pool memory allocator for Rx
- mlx5: support for one PTP device per hardware clock
- mlx5: support for 200Gbps per-lane link modes
- mlx5: move IPSec policy check after decryption
- AMD/Solarflare:
- support FW flashing via devlink
- Cisco (enic):
- use page pool memory allocator for Rx
- enable 32, 64 byte CQEs
- get max rx/tx ring size from the device
- Meta (fbnic):
- support flow steering and RSS configuration
- report queue stats
- support TCP segmentation
- support IRQ coalescing
- support ring size configuration
- Marvell/Cavium:
- support AF_XDP
- Wangxun:
- support for PTP clock and timestamping
- Huawei (hibmcge):
- checksum offload
- add more statistics
- Ethernet virtual:
- VirtIO net:
- aggressively suppress Tx completions, improve perf by 96%
with 1 CPU and 55% with 2 CPUs
- expose NAPI to IRQ mapping and persist NAPI settings
- Google (gve):
- support XDP in DQO RDA Queue Format
- opt into instance locking
- Microsoft vNIC:
- support BIG TCP
- Ethernet NICs consumer, and embedded:
- Synopsys (stmmac):
- cleanup Tx and Tx clock setting and other link-focused
cleanups
- enable SGMII and 2500BASEX mode switching for Intel platforms
- support Sophgo SG2044
- Broadcom switches (b53):
- support for BCM53101
- TI:
- iep: add perout configuration support
- icssg: support XDP
- Cadence (macb):
- implement BQL
- Xilinx (axinet):
- support dynamic IRQ moderation and changing coalescing at
runtime
- implement BQL
- report standard stats
- MediaTek:
- support phylink managed EEE
- Intel:
- igc: don't restart the interface on every XDP program change
- RealTek (r8169):
- support reading registers of internal PHYs directly
- increase max jumbo packet size on RTL8125/RTL8126
- Airoha:
- support for RISC-V NPU packet processing unit
- enable scatter-gather and support MTU up to 9kB
- Tehuti (tn40xx):
- support cards with TN4010 MAC and an Aquantia AQR105 PHY
- Ethernet PHYs:
- support for TJA1102S, TJA1121
- dp83tg720: add randomized polling intervals for link detection
- dp83822: support changing the transmit amplitude voltage
- support for LEDs on 88q2xxx
- CAN:
- canxl: support Remote Request Substitution bit access
- flexcan: add S32G2/S32G3 SoC
- WiFi:
- remove cooked monitor support
- strict mode for better AP testing
- basic EPCS support
- OMI RX bandwidth reduction support
- batman-adv: add support for jumbo frames
- WiFi drivers:
- RealTek (rtw88):
- support RTL8814AE and RTL8814AU
- RealTek (rtw89):
- switch using wiphy_lock and wiphy_work
- add BB context to manipulate two PHY as preparation of MLO
- improve BT-coexistence mechanism to play A2DP smoothly
- Intel (iwlwifi):
- add new iwlmld sub-driver for latest HW/FW combinations
- MediaTek (mt76):
- preparation for mt7996 Multi-Link Operation (MLO) support
- Qualcomm/Atheros (ath12k):
- continued work on MLO
- Silabs (wfx):
- Wake-on-WLAN support
- Bluetooth:
- add support for skb TX SND/COMPLETION timestamping
- hci_core: enable buffer flow control for SCO/eSCO
- coredump: log devcd dumps into the monitor
- Bluetooth drivers:
- intel: add support to configure TX power
- nxp: handle bootloader error during cmd5 and cmd7"
* tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits)
unix: fix up for "apparmor: add fine grained af_unix mediation"
mctp: Fix incorrect tx flow invalidation condition in mctp-i2c
net: usb: asix: ax88772: Increase phy_name size
net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string
net: libwx: fix Tx L4 checksum
net: libwx: fix Tx descriptor content for some tunnel packets
atm: Fix NULL pointer dereference
net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards
net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card
net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus
net: phy: aquantia: add essential functions to aqr105 driver
net: phy: aquantia: search for firmware-name in fwnode
net: phy: aquantia: add probe function to aqr105 for firmware loading
net: phy: Add swnode support to mdiobus_scan
gve: add XDP DROP and PASS support for DQ
gve: update XDP allocation path support RX buffer posting
gve: merge packet buffer size fields
gve: update GQ RX to use buf_size
gve: introduce config-based allocation for XDP
gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer cleanups from Thomas Gleixner:
"A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to
the callback pointer. This turned out to be suboptimal for the
upcoming Rust integration and is obviously a silly implementation to
begin with.
This cleanup replaces the hrtimer_init(T); T->function = cb; sequence
with hrtimer_setup(T, cb);
The conversion was done with Coccinelle and a few manual fixups.
Once the conversion has completely landed in mainline, hrtimer_init()
will be removed and the hrtimer::function becomes a private member"
* tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
wifi: rt2x00: Switch to use hrtimer_update_function()
io_uring: Use helper function hrtimer_update_function()
serial: xilinx_uartps: Use helper function hrtimer_update_function()
ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()
RDMA: Switch to use hrtimer_setup()
virtio: mem: Switch to use hrtimer_setup()
drm/vmwgfx: Switch to use hrtimer_setup()
drm/xe/oa: Switch to use hrtimer_setup()
drm/vkms: Switch to use hrtimer_setup()
drm/msm: Switch to use hrtimer_setup()
drm/i915/request: Switch to use hrtimer_setup()
drm/i915/uncore: Switch to use hrtimer_setup()
drm/i915/pmu: Switch to use hrtimer_setup()
drm/i915/perf: Switch to use hrtimer_setup()
drm/i915/gvt: Switch to use hrtimer_setup()
drm/i915/huc: Switch to use hrtimer_setup()
drm/amdgpu: Switch to use hrtimer_setup()
stm class: heartbeat: Switch to use hrtimer_setup()
i2c: Switch to use hrtimer_setup()
iio: Switch to use hrtimer_setup()
...
|
|
So far s390 does not allow mmap() of PCI resources to user-space via the
usual mechanisms, though it does use it for RDMA. For the PCI sysfs
resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor
ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on
disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars
for all devices.
This is partly because access to mapped PCI resources from user-space
requires special PCI load/store memory-I/O (MIO) instructions, or the
special MMIO syscalls when these are not available. Still, such access is
possible and useful not just for RDMA, in fact not being able to mmap() PCI
resources has previously caused extra work when testing devices.
One thing that doesn't work with PCI resources mapped to user-space though
is the s390 specific virtual ISM device. Not only because the BAR size of
256 TiB prevents mapping the whole BAR but also because access requires use
of the legacy PCI instructions which are not accessible to user-space on
systems with the newer MIO PCI instructions.
Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping
its resources while making this functionality available for all other PCI
devices. To this end introduce a minimal implementation of PCI_QUIRKS and
use that to set pdev->non_mappable_bars for ISM devices only. Then also set
ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic
implementation of pci_mmap_resource_range() enabling only the newer sysfs
mmap() interface. This follows the recommendation in
Documentation/PCI/sysfs-pci.rst.
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Use asm_inline for all inline assemblies which make use of the EX_TABLE or
ALTERNATIVE macros.
These macros expand to many lines and the compiler assumes the number of
lines within an inline assembly is the same as the number of instructions
within an inline assembly. This has an effect on inlining and loop
unrolling decisions.
In order to avoid incorrect assumptions use asm_inline, which tells the
compiler that an inline assembly has the smallest possible size.
In order to avoid confusion when asm_inline should be used or not, since a
couple of inline assemblies are quite large: the rule is to always use
asm_inline whenever the EX_TABLE or ALTERNATIVE macro is used. In specific
cases there may be reasons to not follow this guideline, but that should
be documented with the corresponding code.
Using the inline qualifier everywhere has only a small effect on the kernel
image size:
add/remove: 0/10 grow/shrink: 19/8 up/down: 1492/-1858 (-366)
The only location where this seems to matter is load_unaligned_zeropad()
from word-at-a-time.h where the compiler inlines more functions within the
dcache code, which is indeed code where performance matters.
Suggested-by: Juergen Christ <jchrist@linux.ibm.com>
Reviewed-by: Juergen Christ <jchrist@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The dcssblk driver has long needed special case supoprt to enable limited
dax operation, so called CONFIG_FS_DAX_LIMITED. This mode works around
the incomplete support for ZONE_DEVICE on s390 by forgoing the ability of
dax-mapped pages to support GUP.
Now, pending cleanups to fsdax that fix its reference counting [1] depend
on the ability of all dax drivers to supply ZONE_DEVICE pages.
To allow that work to move forward, dax support needs to be paused for
dcssblk until ZONE_DEVICE support arrives. That work has been known for a
few years [2], and the removal of "pte_devmap" requirements [3] makes the
conversion easier.
For now, place the support behind CONFIG_BROKEN, and remove PFN_SPECIAL
(dcssblk was the only user).
Link: http://lore.kernel.org/cover.9f0e45d52f5cff58807831b6b867084d0b14b61c.1725941415.git-series.apopple@nvidia.com [1]
Link: http://lore.kernel.org/20210820210318.187742e8@thinkpad/ [2]
Link: http://lore.kernel.org/4511465a4f8429f45e2ac70d2e65dc5e1df1eb47.1725941415.git-series.apopple@nvidia.com [3]
Link: https://lkml.kernel.org/r/33eef2379c0d240f40cc15453fad2df1a4ae34c8.1740713401.git-series.apopple@nvidia.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Asahi Lina <lina@asahilina.net>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linmiaohe <linmiaohe@huawei.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Michael "Camp Drill Sergeant" Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The guest's AP configuration is cleared when the mdev is removed, so
userspace must be notified that the AP configuration has changed. To this
end, this patch:
* Removes call to 'signal_guest_ap_cfg_changed()' function from the
'vfio_ap_mdev_unset_kvm()' function because it has no affect given it is
called after the mdev fd is closed.
* Adds call to 'signal_guest_ap_cfg_changed()' function to the
'vfio_ap_mdev_request()' function to notify userspace that the guest's
AP configuration has changed before signaling the request to remove the
mdev.
Minor change - Fixed an indentation issue in function
'signal_guest_ap_cfg_changed()'
Fixes: 07d89045bffe ("s390/vfio-ap: Signal eventfd when guest AP configuration is changed")
Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250304200812.54556-1-rreyes@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The check for get_zeroed_page() leads to a direct return
and overlooked the memory leak caused by loop allocation.
Add a free helper to free spaces allocated by get_zeroed_page().
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250218034104.2436469-1-haoxiang_li2024@163.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Add check for the return value of get_zeroed_page() in
sclp_console_init() to prevent null pointer dereference.
Furthermore, to solve the memory leak caused by the loop
allocation, add a free helper to do the free job.
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20250218025216.2421548-1-haoxiang_li2024@163.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Move machine type detection to the decompressor and use static branches
to implement and use machine_is_[lpar|vm|kvm]() instead of a runtime check
via MACHINE_IS_[LPAR|VM|KVM].
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use static branch(es) to implement and use machine_has_esop() instead
of a runtime check via MACHINE_HAS_ESOP.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Use static branch(es) to implement and use machine_has_tlb_guest()
instead of a runtime check via MACHINE_HAS_TLB_GUEST.
Also add sclp_early_detect_machine_features() in order to allow for
feature detection from the decompressor.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Convert MACHINE_HAS_... to cpu_has_...() which uses test_facility() instead
of testing the machine_flags lowcore member if the feature is present.
test_facility() generates better code since it results in a static branch
without accessing memory. The branch is patched via alternatives by the
decompressor depending on the availability of the required facility.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The linux390@de.ibm.com email address is outdated. It has been removed from
the MAINTAINERS file nearly 10 years ago with commit 64776820b4ff ("s390:
remove generic email address from maintainers file").
Change MODULE_AUTHOR() accordingly and also use the more common
IBM Corporation company name.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc4).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Smaller than usual with no fixes from any subtree.
Current release - regressions:
- core: fix race of rtnl_net_lock(dev_net(dev))
Previous releases - regressions:
- core: remove the single page frag cache for good
- flow_dissector: fix handling of mixed port and port-range keys
- sched: cls_api: fix error handling causing NULL dereference
- tcp:
- adjust rcvq_space after updating scaling ratio
- drop secpath at the same time as we currently drop dst
- eth: gtp: suppress list corruption splat in gtp_net_exit_batch_rtnl().
Previous releases - always broken:
- vsock:
- fix variables initialization during resuming
- for connectible sockets allow only connected
- eth:
- geneve: fix use-after-free in geneve_find_dev()
- ibmvnic: don't reference skb after sending to VIOS"
* tag 'net-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
Revert "net: skb: introduce and use a single page frag cache"
net: allow small head cache usage with large MAX_SKB_FRAGS values
nfp: bpf: Add check for nfp_app_ctrl_msg_alloc()
tcp: drop secpath at the same time as we currently drop dst
net: axienet: Set mac_managed_pm
arp: switch to dev_getbyhwaddr() in arp_req_set_public()
net: Add non-RCU dev_getbyhwaddr() helper
sctp: Fix undefined behavior in left shift operation
selftests/bpf: Add a specific dst port matching
flow_dissector: Fix port range key handling in BPF conversion
selftests/net/forwarding: Add a test case for tc-flower of mixed port and port-range
flow_dissector: Fix handling of mixed port and port-range keys
geneve: Suppress list corruption splat in geneve_destroy_tunnels().
gtp: Suppress list corruption splat in gtp_net_exit_batch_rtnl().
dev: Use rtnl_net_dev_lock() in unregister_netdev().
net: Fix dev_net(dev) race in unregister_netdevice_notifier_dev_net().
net: Add net_passive_inc() and net_passive_dec().
net: pse-pd: pd692x0: Fix power limit retrieval
MAINTAINERS: trim the GVE entry
gve: set xdp redirect target only when it is available
...
|
|
In this patch, an eventfd object is created by the vfio_ap device driver
and used to notify userspace when a guests's AP configuration is
dynamically changed. Such changes may occur whenever:
* An adapter, domain or control domain is assigned to or unassigned from a
mediated device that is attached to the guest.
* A queue assigned to the mediated device that is attached to a guest is
bound to or unbound from the vfio_ap device driver. This can occur
either by manually binding/unbinding the queue via the vfio_ap driver's
sysfs bind/unbind attribute interfaces, or because an adapter, domain or
control domain assigned to the mediated device is added to or removed
from the host's AP configuration via an SE/HMC
The purpose of this patch is to provide immediate notification of changes
made to a guest's AP configuration by the vfio_ap driver. This will enable
the guest to take immediate action rather than relying on polling or some
other inefficient mechanism to detect changes to its AP configuration.
Note that there are corresponding QEMU patches that will be shipped along
with this patch (see vfio-ap: Report vfio-ap configuration changes) that
will pick up the eventfd signal.
Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250107183645.90082-1-rreyes@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The vfio-ccw driver and the vfio parent device provided by it (parent)
support just a single mdev_type, and this is not likely to change any
time soon. To match the mdev interfaces nicely initially the choice was
made that mdev_types (which gets passed into mdev_register_parent())
shall be an array of pointers to struct mdev_type with a single element,
and to make things worse it ended up being the last member.
Now the problem with that is that before C99 the usual way to get
something similar to a flexible array member was to use a trailing array
of size 0 or 1. This is what I called fake flex array. For a while now
the community is trying to get rid of fake flex arrays. And while
mdev_types was not a fake flex array but an array of size one, because
it can easily be and probably was mistaken for a fake flex array it got
converted into a real C99 flex array with a compile time known constant
size of one.
As per [1] it was established that "only fake flexible arrays should be
transformed into C99 flex-array members". Since IMHO the entire point of
flex arrays is being flexible about the array size at run time, a C99
flex array is a poor fit for mdev_types. But an array of a size one is
a poor fit as well for the reason stated above, let us try to get rid of
the flex array without introducing back the one sized array.
So, lets make mdev_types a pointer to struct mdev_type and pass in the
address of that pointer as the 4th formal parameter of
mdev_register_parent().
[1] https://lore.kernel.org/lkml/85863d7a-2d8b-4c1b-b76a-e2f40834a7a8@embeddedor.com/
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Link: https://lore.kernel.org/r/20250217100614.3043620-3-pasic@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The vfio-ap driver and the vfio parent device provided by it
(matrix_dev) support just a single mdev_type, and this is not likely to
change any time soon. Despite that matrix_dev->mdev_types started out
as a C99 flexible array presumably as a typo, and since the typo messed
up the allocation, commit e2c8cee9f489 ("s390/vfio-ap: Fix memory
allocation for mdev_types array") changed it to an array of size 1. And
to make things worse mdev_types happens to be the last member of struct
ap_matrix_dev.
Now the problem with that is that before C99 the usual way to get
something similar to a flexible array member was to use a trailing array of
size 0 or 1. This is what I called fake flex array. For a while now the
community is trying to get rid of fake flex arrays. And while mdev_types
is not a fake flex array but an array of size one (to match the mdev
interfaces nicer), it can easily be and was mistaken for a fake flex
array.
So, let us make mdev_types a pointer to struct mdev_type and pass in the
address of that pointer as the 4th formal parameter of
mdev_register_parent().
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250217100614.3043620-2-pasic@linux.ibm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The linux390@de.ibm.com email address is outdated. It has been removed
from the MAINTAINERS file nearly 10 years ago with commit 64776820b4ff
("s390: remove generic email address from maintainers file"); remove
one of the last remaining occurrences.
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
Remove any extra indentation to improve the code's readability.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Link: https://lore.kernel.org/r/20250116131146.105439-2-thorsten.blum@linux.dev
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
hrtimer_setup() takes the callback function pointer as argument and
initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of
hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/d8a3919c09caa671e64d9f9bbf726aa8a885a844.1738746821.git.namcao@linutronix.de
|
|
According to device_release() in /drivers/base/core.c,
a device without a release function is a broken device
and must be fixed.
The current code directly frees the device after calling device_add()
without waiting for other kernel parts to release their references.
Thus, a reference could still be held to a struct device,
e.g., by sysfs, leading to potential use-after-free
issues if a proper release function is not set.
Fixes: 8c81ba20349d ("net/smc: De-tangle ism and smc device initialization")
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250214120137.563409-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix isolated VFs handling by verifying that a VF’s parent PF is
locally owned before registering it in an existing PCI domain
- Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES to
workaround gcc failure in handling __builtin_constant_p() in this
case
- Fix CHPID "configure" attribute caching in CIO by not updating the
cache when SCLP returns no data, ensuring consistent sysfs output
- Remove CONFIG_LSM from default configs and rely on defaults, which
enables BPF LSM hook
* tag 's390-6.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/pci: Fix handling of isolated VFs
s390/pci: Pull search for parent PF out of zpci_iov_setup_virtfn()
s390/bitops: Disable arch_test_bit() optimization for PROFILE_ALL_BRANCHES
s390/cio: Fix CHPID "configure" attribute caching
s390/configs: Remove CONFIG_LSM
|
|
Cross-merge networking fixes after downstream PR (net-6.14-rc3).
No conflicts or adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Like other drivers qeth is calling local_bh_enable() after napi_schedule()
to kick-start softirqs [0].
Since netif_napi_add_tx() and napi_enable() now take the netdev_lock()
mutex [1], move them out from under the BH protection. Same solution as in
commit a60558644e20 ("wifi: mt76: move napi_enable() from under BH")
Fixes: 1b23cdbd2bbc ("net: protect netdev->napi_list with netdev_lock()")
Link: https://lore.kernel.org/netdev/20240612181900.4d9d18d0@kernel.org/ [0]
Link: https://lore.kernel.org/netdev/20250115035319.559603-1-kuba@kernel.org/ [1]
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Joe Damato <jdamato@fastly.com>
Link: https://patch.msgid.link/20250212163659.2287292-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In some environments, the SCLP firmware interface used to query a
CHPID's configured state is not supported. On these environments,
rapidly reading the corresponding sysfs attribute produces inconsistent
results:
$ cat /sys/devices/css0/chp0.00/configure
cat: /sys/devices/css0/chp0.00/configure: Operation not supported
$ cat /sys/devices/css0/chp0.00/configure
3
This occurs for example when Linux is run as a KVM guest. The
inconsistency is a result of CIO using cached results for generating
the value of the "configure" attribute while failing to handle the
situation where no data was returned by SCLP.
Fix this by not updating the cache-expiration timestamp when SCLP
returns no data. With the fix applied, the system response is
consistent:
$ cat /sys/devices/css0/chp0.00/configure
cat: /sys/devices/css0/chp0.00/configure: Operation not supported
$ cat /sys/devices/css0/chp0.00/configure
cat: /sys/devices/css0/chp0.00/configure: Operation not supported
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Tested-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
The original Open Systems Adapter (OSA) was introduced by IBM in the
mid-90s. These were then superseded by OSA-Express in 1999 which used
Queued Direct IO to greatly improve throughput. The newer cards
retained the older, slower non-QDIO (OSE) modes for compatibility with
older systems. In Linux, the lcs driver was responsible for cards
operating in the older OSE mode and the qeth driver was introduced to
allow the OSA-Express cards to operate in the newer QDIO (OSD) mode.
For an S390 machine from 1998 or later, there is no reason to use the
OSE mode and lcs driver as all OSA cards since 1999 provide the faster
OSD mode. As a result, it's been years since we have heard of a
customer configuration involving the lcs driver.
This patch removes the lcs driver. The technology it supports has been
obsolete for past 25+ years and is irrelevant for current use cases.
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Aswin Karuvally <aswin@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250204103135.1619097-1-wintera@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Alexander Gordeev:
- The rework that uncoupled physical and virtual address spaces
inadvertently prevented KASAN shadow mappings from using large pages.
Restore large page mappings for KASAN shadows
- Add decompressor routine physmem_alloc() that may fail, unlike
physmem_alloc_or_die(). This allows callers to implement fallback
paths
- Allow falling back from large pages to smaller pages (1MB or 4KB) if
the allocation of 2GB pages in the decompressor can not be fulfilled
- Add to the decompressor boot print support of "%%" format string,
width and padding hadnling, length modifiers and decimal conversion
specifiers
- Add to the decompressor message severity levels similar to kernel
ones. Support command-line options that control console output
verbosity
- Replaces boot_printk() calls with appropriate loglevel- specific
helpers such as boot_emerg(), boot_warn(), and boot_debug().
- Collect all boot messages into a ring buffer independent of the
current log level. This is particularly useful for early crash
analysis
- If 'earlyprintk' command line parameter is not specified, store
decompressor boot messages in a ring buffer to be printed later by
the kernel, once the console driver is registered
- Add 'bootdebug' command line parameter to enable printing of
decompressor debug messages when needed. That parameters allows
message suppressing and filtering
- Dump boot messages on a decompressor crash, but only if 'bootdebug'
command line parameter is enabled
- When CONFIG_PRINTK_TIME is enabled, add timestamps to boot messages
in the same format as regular printk()
- Dump physical memory tracking information on boot: online ranges,
reserved areas and vmem allocations
- Dump virtual memory layout and randomization details
- Improve decompression error reporting and dump the message ring
buffer in case the boot failed and system halted
- Add an exception handler which handles exceptions when FPU control
register is attempted to be set to an invalid value. Remove '.fixup'
section as result of this change
- Use 'A', 'O', and 'R' inline assembly format flags, which allows
recent Clang compilers to generate better FPU code
- Rework uaccess code so it reads better and generates more efficient
code
- Cleanup futex inline assembly code
- Disable KMSAN instrumention for futex inline assemblies, which
contain dereferenced user pointers. Otherwise, shadows for the user
pointers would be accessed
- PFs which are not initially configured but in standby create only a
single-function PCI domain. If they are configured later on, sibling
PFs and their child VFs will not be added to their PCI domain
breaking SR-IOV expectations.
Fix that by allowing initially configured but in standby PFs create
multi-function PCI domains
- Add '-std=gnu11' to decompressor and purgatory CFLAGS to avoid
compile errors caused by kernel's own definitions of 'bool', 'false',
and 'true' conflicting with the C23 reserved keywords
- Fix sclp subsystem failure when a sclp console is not present
- Fix misuse of non-NULL terminated strings in vmlogrdr driver
- Various other small improvements, cleanups and fixes
* tag 's390-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (53 commits)
s390/vmlogrdr: Use array instead of string initializer
s390/vmlogrdr: Use internal_name for error messages
s390/sclp: Initialize sclp subsystem via arch_cpu_finalize_init()
s390/tools: Use array instead of string initializer
s390/vmem: Fix null-pointer-arithmetic warning in vmem_map_init()
s390: Add '-std=gnu11' to decompressor and purgatory CFLAGS
s390/bitops: Use correct constraint for arch_test_bit() inline assembly
s390/pci: Fix SR-IOV for PFs initially in standby
s390/futex: Avoid KMSAN instrumention for user pointers
s390/uaccess: Rename get_put_user_noinstr_attributes to uaccess_kmsan_or_inline
s390/futex: Cleanup futex_atomic_cmpxchg_inatomic()
s390/futex: Generate futex atomic op functions
s390/uaccess: Remove INLINE_COPY_FROM_USER and INLINE_COPY_TO_USER
s390/uaccess: Use asm goto for put_user()/get_user()
s390/uaccess: Remove usage of the oac specifier
s390/uaccess: Replace EX_TABLE_UA_LOAD_MEM exception handling
s390/uaccess: Cleanup noinstr __put_user()/__get_user() inline assembly constraints
s390/uaccess: Remove __put_user_fn()/__get_user_fn() wrappers
s390/uaccess: Move put_user() / __put_user() close to put_user() asm code
s390/uaccess: Use asm goto for __mvc_kernel_nofault()
...
|
|
Compiling vmlogrdr with GCC 15 generates this warning:
CC [M] drivers/s390/char/vmlogrdr.o
drivers/s390/char/vmlogrdr.c:126:29: error: initializer-string for array
of ‘char’ is too long [-Werror=unterminated-string-initialization]
126 | { .system_service = "*LOGREC ",
Given that the system_service array intentionally contains a non-null
terminated string use an array initializer, instead of string
initializer to get rid of this warning.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Use the internal_name member of vmlogrdr_priv_t to print error messages
instead of the system_service member. The system_service member is not a
string, but a non-null terminated eight byte character array, which
contains the ASCII representation of a z/VM system service.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core and debugfs updates from Greg KH:
"Here is the big set of driver core and debugfs updates for 6.14-rc1.
Included in here is a bunch of driver core, PCI, OF, and platform rust
bindings (all acked by the different subsystem maintainers), hence the
merge conflict with the rust tree, and some driver core api updates to
mark things as const, which will also require some fixups due to new
stuff coming in through other trees in this merge window.
There are also a bunch of debugfs updates from Al, and there is at
least one user that does have a regression with these, but Al is
working on tracking down the fix for it. In my use (and everyone
else's linux-next use), it does not seem like a big issue at the
moment.
Here's a short list of the things in here:
- driver core rust bindings for PCI, platform, OF, and some i/o
functions.
We are almost at the "write a real driver in rust" stage now,
depending on what you want to do.
- misc device rust bindings and a sample driver to show how to use
them
- debugfs cleanups in the fs as well as the users of the fs api for
places where drivers got it wrong or were unnecessarily doing
things in complex ways.
- driver core const work, making more of the api take const * for
different parameters to make the rust bindings easier overall.
- other small fixes and updates
All of these have been in linux-next with all of the aforementioned
merge conflicts, and the one debugfs issue, which looks to be resolved
"soon""
* tag 'driver-core-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (95 commits)
rust: device: Use as_char_ptr() to avoid explicit cast
rust: device: Replace CString with CStr in property_present()
devcoredump: Constify 'struct bin_attribute'
devcoredump: Define 'struct bin_attribute' through macro
rust: device: Add property_present()
saner replacement for debugfs_rename()
orangefs-debugfs: don't mess with ->d_name
octeontx2: don't mess with ->d_parent or ->d_parent->d_name
arm_scmi: don't mess with ->d_parent->d_name
slub: don't mess with ->d_name
sof-client-ipc-flood-test: don't mess with ->d_name
qat: don't mess with ->d_name
xhci: don't mess with ->d_iname
mtu3: don't mess wiht ->d_iname
greybus/camera - stop messing with ->d_iname
mediatek: stop messing with ->d_iname
netdevsim: don't embed file_operations into your structs
b43legacy: make use of debugfs_get_aux()
b43: stop embedding struct file_operations into their objects
carl9170: stop embedding file_operations into their objects
...
|
|
With the switch to GENERIC_CPU_DEVICES an early call to the sclp subsystem
was added to smp_prepare_cpus(). This will usually succeed since the sclp
subsystem is implicitly initialized early enough if an sclp based console
is present.
If no such console is present the initialization happens with an
arch_initcall(); in such cases calls to the sclp subsystem will fail.
For CPU detection this means that the fallback sigp loop will be used
permanently to detect CPUs instead of the preferred READ_CPU_INFO sclp
request.
Fix this by adding an explicit early sclp_init() call via
arch_cpu_finalize_init().
Reported-by: Sheshu Ramanandan <sheshu.ramanandan@ibm.com>
Fixes: 4a39f12e753d ("s390/smp: Switch to GENERIC_CPU_DEVICES")
Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, lpfc, fnic, qla2xx, mpi3mr).
The major core change is the renaming of the slave_ methods plus a bit
of constification. The rest are minor updates and fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (103 commits)
scsi: fnic: Propagate SCSI error code from fnic_scsi_drv_init()
scsi: fnic: Test for memory allocation failure and return error code
scsi: fnic: Return appropriate error code from failure of scsi drv init
scsi: fnic: Return appropriate error code for mem alloc failure
scsi: fnic: Remove always-true IS_FNIC_FCP_INITIATOR macro
scsi: fnic: Fix use of uninitialized value in debug message
scsi: fnic: Delete incorrect debugfs error handling
scsi: fnic: Remove unnecessary else to fix warning in FDLS FIP
scsi: fnic: Remove extern definition from .c files
scsi: fnic: Remove unnecessary else and unnecessary break in FDLS
scsi: mpi3mr: Fix possible crash when setting up bsg fails
scsi: ufs: bsg: Set bsg_queue to NULL after removal
scsi: ufs: bsg: Delete bsg_dev when setting up bsg fails
scsi: st: Don't set pos_unknown just after device recognition
scsi: aic7xxx: Fix build 'aicasm' warning
scsi: Revert "scsi: ufs: core: Probe for EXT_IID support"
scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
scsi: scsi_debug: Constify sdebug_driver_template
scsi: documentation: Corrections for struct updates
scsi: driver-api: documentation: Change what is added to docbook
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Select config option KASAN_VMALLOC if KASAN is enabled
- Select config option VMAP_STACK unconditionally
- Implement arch_atomic_inc() / arch_atomic_dec() functions which
result in a single instruction if compiled for z196 or newer
architectures
- Make layering between atomic.h and atomic_ops.h consistent
- Comment s390 preempt_count implementation
- Remove pre MARCH_HAS_Z196_FEATURES preempt count implementation
- GCC uses the number of lines of an inline assembly to calculate
number of instructions and decide on inlining. Therefore remove
superfluous new lines from a couple of inline assemblies.
- Provide arch_atomic_*_and_test() implementations that allow the
compiler to generate slightly better code.
- Optimize __preempt_count_dec_and_test()
- Remove __bootdata annotations from declarations in header files
- Add missing include of <linux/smp.h> in abs_lowcore.h to provide
declarations for get_cpu() and put_cpu() used in the code
- Fix suboptimal kernel image base when running make kasan.config
- Remove huge_pte_none() and huge_pte_none_mostly() as are identical to
the generic variants
- Remove unused PAGE_KERNEL_EXEC, SEGMENT_KERNEL_EXEC, and
REGION3_KERNEL_EXEC defines
- Simplify noexec page protection handling and change the page, segment
and region3 protection definitions automatically if the instruction
execution-protection facility is not available
- Save one instruction and prefer EXRL instruction over EX in string,
xor_*(), amode31 and other functions
- Create /dev/diag misc device to fetch diagnose specific information
from the kernel and provide it to userspace
- Retrieve electrical power readings using DIAGNOSE 0x324 ioctl
- Make ccw_device_get_ciw() consistent and use array indices instead of
pointer arithmetic
- s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into
one place
- The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that in s390 code
- Add missing TLB range adjustment in pud_free_tlb()
- Improve topology setup by adding early polarization detection
- Fix length checks in codepage_convert() function
- The generic bitops implementation is nearly identical to the s390
one. Switch to the generic variant and decrease a bit the kernel
image size
- Provide an optimized arch_test_bit() implementation which makes use
of flag output constraint. This generates slightly better code
- Provide memory topology information obtanied with DIAGNOSE 0x310
using ioctl.
- Various other small improvements, fixes, and cleanups
Also, some changes came in through a merge of 'pci-device-recovery'
branch:
- Add PCI error recovery status mechanism
- Simplify and document debug_next_entry() logic
- Split private data allocation and freeing out of debug file open()
and close() operations
- Add debug_dump() function that gets a textual representation of a
debug info (e.g. PCI recovery hardware error logs)
- Add formatted content of pci_debug_msg_id to the PCI report
* tag 's390-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits)
s390/futex: Fix FUTEX_OP_ANDN implementation
s390/diag: Add memory topology information via diag310
s390/bitops: Provide optimized arch_test_bit()
s390/bitops: Switch to generic bitops
s390/ebcdic: Fix length decrement in codepage_convert()
s390/ebcdic: Fix length check in codepage_convert()
s390/ebcdic: Use exrl instead of ex
s390/amode31: Use exrl instead of ex
s390/stackleak: Use exrl instead of ex in __stackleak_poison()
s390/lib: Use exrl instead of ex in xor functions
s390/topology: Improve topology detection
s390/tlb: Add missing TLB range adjustment
s390/pkey: Constify 'struct bin_attribute'
s390/sclp: Constify 'struct bin_attribute'
s390/pci: Constify 'struct bin_attribute'
s390/ipl: Constify 'struct bin_attribute'
s390/crypto/cpacf: Constify 'struct bin_attribute'
s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into one place
s390/cio: Use array indices instead of pointer arithmetic
s390/qdio: Rename feature flag aif_osa to aif_qdio
...
|
|
Introduce diag310 and memory topology related subcodes.
Provide memory topology information obtanied from diag310 to userspace
via diag ioctl.
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
As there is only one in-tree user, avoid a transition phase and switch
that user in the same commit. As there are some interdependencies
between the constness of the different symbols in the s390 driver,
covert the whole driver at once.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/r/20241205-sysfs-const-bin_attr-groups_macro-v1-1-ac5e855031e8@weissschuh.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Tested-by: Holger Dengler <dengler@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Link: https://lore.kernel.org/r/20241211-sysfs-const-bin_attr-s390-v1-5-be01f66bfcf7@weissschuh.net
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
The sysfs core now allows instances of 'struct bin_attribute' to be
moved into read-only memory. Make use of that to protect them against
accidental or malicious modifications.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20241211-sysfs-const-bin_attr-s390-v1-4-be01f66bfcf7@weissschuh.net
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
Instead of distributing the memory allocation and pointer arithmetic to
place slib and sl on the page that is allocated for them over multiple
functions and comments, move both into the same context directly next to
each other, so that the knowledge of how this is done is immediately
visible.
The actual layout in memory doesn't change with this, just the structure
of the code to achieve it.
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
ccw_device_get_ciw() already uses array indices to iterate over the vector
of CIWs, but then switches to pointer arithmetic when returning the one it
found. Change this to make it more consistent.
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
This feature is not only utilized by OSA, but by QDIO in general. Clear
up possible confusions.
Signed-off-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
process passthrough commands (bsg-lib, ufs tmf, various nvme admin
queues) and thus don't even check the flag. Remove it to simplify the
driver interface.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Niklas Schnelle says:
===================
This patch series enhances the introspectability of the PCI device
recovery for firmware. Until now when Linux performs recovery in
response to a firmware error report. For example, until now firmware
debug data would have no indication if the recovery was successfull or
if it failed, for example due to KVM pass-through.
Improve on this by reporting recovery status as well as some debug
information such as device driver name and s390dbf/pci_msg/sprintf logs
via the SCLP Write Event Data Action Qualifier 2 (Log Data provided)
mechanism.
===================
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|