Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- four fixes for the netdev per-instance locking
Current release - new code bugs:
- consolidate more code between existing Rx zero-copy and uring so
that the latter doesn't miss / have to duplicate the safety checks
Previous releases - regressions:
- ipv6: fix omitted Netlink attributes when using SKIP_STATS
Previous releases - always broken:
- net: fix geneve_opt length integer overflow
- udp: fix multiple wrap arounds of sk->sk_rmem_alloc when it
approaches INT_MAX
- dsa: mvpp2: add a lock to avoid corruption of the shared TCAM
- dsa: airoha: fix issues with traffic QoS configuration / offload,
and flow table offload
Misc:
- touch up the Netlink YAML specs of old families to make them usable
for user space C codegen"
* tag 'net-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
selftests: net: amt: indicate progress in the stress test
netlink: specs: rt_route: pull the ifa- prefix out of the names
netlink: specs: rt_addr: pull the ifa- prefix out of the names
netlink: specs: rt_addr: fix get multi command name
netlink: specs: rt_addr: fix the spec format / schema failures
net: avoid false positive warnings in __net_mp_close_rxq()
net: move mp dev config validation to __net_mp_open_rxq()
net: ibmveth: make veth_pool_store stop hanging
arcnet: Add NULL check in com20020pci_probe()
ipv6: Do not consider link down nexthops in path selection
ipv6: Start path selection from the first nexthop
usbnet:fix NPE during rx_complete
net: octeontx2: Handle XDP_ABORTED and XDP invalid as XDP_DROP
net: fix geneve_opt length integer overflow
io_uring/zcrx: fix selftests w/ updated netdev Python helpers
selftests: net: use netdevsim in netns test
docs: net: document netdev notifier expectations
net: dummy: request ops lock
netdevsim: add dummy device notifiers
net: rename rtnl_net_debug to lock_debug
...
|
|
Our CI expects output from the test at least once every 10 minutes.
The AMT test when running on debug kernel is just on the edge
of that time for the stress test. Improve the output:
- print the name of the test first, before starting it,
- output a dot every 10% of the way.
Output after:
TEST: amt discovery [ OK ]
TEST: IPv4 amt multicast forwarding [ OK ]
TEST: IPv6 amt multicast forwarding [ OK ]
TEST: IPv4 amt traffic forwarding torture .......... [ OK ]
TEST: IPv6 amt traffic forwarding torture .......... [ OK ]
Reviewed-by: Taehee Yoo <ap420073@gmail.com>
Link: https://patch.msgid.link/20250403145636.2891166-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
YAML specs don't normally include the C prefix name in the name
of the YAML attr. Remove the ifa- prefix from all attributes
in addr-attrs and specify name-prefix instead.
This is a bit risky, hopefully there aren't many users out there.
Fixes: dfb0f7d9d979 ("doc/netlink: Add spec for rt addr messages")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Command names should match C defines, codegens may depend on it.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 4f280376e531 ("selftests/net: Add selftest for IPv4 RTM_GETMULTICAST support")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250403013706.2828322-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull more io_uring updates from Jens Axboe:
"Set of fixes/updates for io_uring that should go into this release.
The ublk bits could've gone via either tree - usually I put them in
block, but they got a bit mixed this series with the zero-copy
supported that ended up dipping into both trees.
This contains:
- Fix for sendmsg zc, include in pinned pages accounting like we do
for the other zc types
- Series for ublk fixing request aborting, doing various little
cleanups, fixing some zc issues, and adding queue_rqs support
- Another ublk series doing some code cleanups
- Series cleaning up the io_uring send path, mostly in preparation
for registered buffers
- Series doing little MSG_RING cleanups
- Fix for the newly added zc rx, fixing len being 0 for the last
invocation of the callback
- Add vectored registered buffer support for ublk. With that, then
ublk also supports this feature in the kernel revision where it
could generically introduced for rw/net
- A bunch of selftest additions for ublk. This is the majority of the
diffstat
- Silence a KCSAN data race warning for io-wq
- Various little cleanups and fixes"
* tag 'io_uring-6.15-20250403' of git://git.kernel.dk/linux: (44 commits)
io_uring: always do atomic put from iowq
selftests: ublk: enable zero copy for stripe target
io_uring: support vectored kernel fixed buffer
block: add for_each_mp_bvec()
io_uring: add validate_fixed_range() for validate fixed buffer
selftests: ublk: kublk: fix an error log line
selftests: ublk: kublk: use ioctl-encoded opcodes
io_uring/zcrx: return early from io_zcrx_recv_skb if readlen is 0
io_uring/net: avoid import_ubuf for regvec send
io_uring/rsrc: check size when importing reg buffer
io_uring: cleanup {g,s]etsockopt sqe reading
io_uring: hide caches sqes from drivers
io_uring: make zcrx depend on CONFIG_IO_URING
io_uring: add req flag invariant build assertion
Documentation: ublk: remove dead footnote
selftests: ublk: specify io_cmd_buf pointer type
ublk: specify io_cmd_buf pointer type
io_uring: don't pass ctx to tw add remote helper
io_uring/msg: initialise msg request opcode
io_uring/msg: rename io_double_lock_ctx()
...
|
|
Fix io_uring zero copy rx selftest with updated netdev Python helpers.
Signed-off-by: David Wei <dw@davidwei.uk>
Link: https://patch.msgid.link/20250402172414.895276-1-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Netdevsim has extra register_netdevice_notifier_dev_net notifiers,
use netdevim instead of dummy device to test them out.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250401163452.622454-9-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"We see a net reduction of the number of lines of code thanks to the
removal of a now unused driver and a testing tool that is not used
anymore. Apart from this, the max31335 driver gets support for a new
part number and pm8xxx gets UEFI support.
Core:
- setdate is removed as it has better replacements
- skip alarms with a second resolution when we know the RTC doesn't
support those.
Subsystem:
- remove unnecessary private struct members
- use devm_pm_set_wake_irq were relevant
Drivers:
- ds1307: stop disabling alarms on probe for DS1337, DS1339, DS1341
and DS3231
- max31335: add max31331 support
- pcf50633 is removed as support for the related SoC has been removed
- pcf85063: properly handle POR failures"
* tag 'rtc-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (50 commits)
rtc: remove 'setdate' test program
selftest: rtc: skip some tests if the alarm only supports minutes
rtc: mt6397: drop unused defines
rtc: pcf85063: replace dev_err+return with return dev_err_probe
rtc: pcf85063: do a SW reset if POR failed
rtc: max31335: Add driver support for max31331
dt-bindings: rtc: max31335: Add max31331 support
rtc: cros-ec: Avoid a couple of -Wflex-array-member-not-at-end warnings
dt-bindings: rtc: pcf2127: Reference spi-peripheral-props.yaml
rtc: rzn1: implement one-second accuracy for alarms
rtc: pcf50633: Remove
rtc: pm8xxx: implement qcom,no-alarm flag for non-HLOS owned alarm
rtc: pm8xxx: mitigate flash wear
rtc: pm8xxx: add support for uefi offset
dt-bindings: rtc: qcom-pm8xxx: document qcom,no-alarm flag
rtc: rv3032: drop WADA
rtc: rv3032: fix EERD location
rtc: pm8xxx: switch to devm_device_init_wakeup
rtc: pm8xxx: fix possible race condition
rtc: mpfs: switch to devm_device_init_wakeup
...
|
|
self-connect-ipv6 got slightly flaky on netdev:
> # timeout set to 120
> # selftests: net/tcp_ao: self-connect_ipv6
> # 1..5
> # # 708[lib/setup.c:250] rand seed 1742872572
> # TAP version 13
> # # 708[lib/proc.c:213] Snmp6 Ip6OutNoRoutes: 0 => 1
> # not ok 1 # error 708[self-connect.c:70] failed to connect()
> # ok 2 No unexpected trace events during the test run
> # # Planned tests != run tests (5 != 2)
> # # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:1
> ok 1 selftests: net/tcp_ao: self-connect_ipv6
I can not reproduce it on my machines, but judging by "Ip6OutNoRoutes"
there is no route to the local_addr (::1).
Looking at the kernel code, I see that kernel does add link-local
address automatically in init_loopback(), but that is called from
ipv6 notifier block. So, in turn the userspace that brought up
the loopback interface may see rtnetlink ACK earlier than
addrconf_notify() does it's job (at least, on a slow VM such as netdev).
Probably, for ipv4 it's the same, judging by inetdev_event().
The fix is quite simple: set the link-local route straight after
bringing the loopback interface. That will make it synchronous.
Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com>
Link: https://patch.msgid.link/20250402-tcp-ao-selfconnect-flake-v1-1-8388d629ef3d@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull bpf fixes from Alexei Starovoitov:
- Fix BPF selftests expectations of assembler output and struct layout
(Song Liu and Yonghong Song)
- Fix XSK error code when queue is full (Wang Liang)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix verifier_private_stack test failure
selftests/bpf: Fix verifier_bpf_fastcall test
selftests/bpf: Fix tests after fields reorder in struct file
xsk: Fix __xsk_generic_xmit() error code when cq is full
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- The series "mm: fixes for fallouts from mem_init() cleanup" from Mike
Rapoport fixes a couple of issues with the just-merged "arch, mm:
reduce code duplication in mem_init()" series
- The series "MAINTAINERS: add my isub-entries to MM part." from Mike
Rapoport does some maintenance on MAINTAINERS
- The series "remove tlb_remove_page_ptdesc()" from Qi Zheng does some
cleanup work to the page mapping code
- The series "mseal system mappings" from Jeff Xu permits sealing of
"system mappings", such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode)
- Plus the usual shower of singleton patches
* tag 'mm-stable-2025-04-02-22-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (31 commits)
mseal sysmap: add arch-support txt
mseal sysmap: enable s390
selftest: test system mappings are sealed
mseal sysmap: update mseal.rst
mseal sysmap: uprobe mapping
mseal sysmap: enable arm64
mseal sysmap: enable x86-64
mseal sysmap: generic vdso vvar mapping
selftests: x86: test_mremap_vdso: skip if vdso is msealed
mseal sysmap: kernel config and header change
mm: pgtable: remove tlb_remove_page_ptdesc()
x86: pgtable: convert to use tlb_remove_ptdesc()
riscv: pgtable: unconditionally use tlb_remove_ptdesc()
mm: pgtable: convert some architectures to use tlb_remove_ptdesc()
mm: pgtable: change pt parameter of tlb_remove_ptdesc() to struct ptdesc*
mm: pgtable: make generic tlb_remove_table() use struct ptdesc
microblaze/mm: put mm_cmdline_setup() in .init.text section
mm/memory_hotplug: fix call folio_test_large with tail page in do_migrate_range
MAINTAINERS: mm: add entry for secretmem
MAINTAINERS: mm: add entry for numa memblocks and numa emulation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Calling scx_bpf_create_dsq() with the same ID would succeed creating
duplicate DSQs. Fix it to return -EEXIST.
- scx_select_cpu_dfl() fixes and cleanups.
- Synchronize tool/sched_ext with external scheduler repo. While this
isn't a fix. There's no risk to the kernel and it's better if they
stay synced closer.
* tag 'sched_ext-for-6.15-rc0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
tools/sched_ext: Sync with scx repo
sched_ext: initialize built-in idle state before ops.init()
sched_ext: create_dsq: Return -EEXIST on duplicate request
sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
sched_ext: idle: Fix return code of scx_select_cpu_dfl()
|
|
Several verifier_private_stack tests failed with latest bpf-next.
For example, for 'Private stack, single prog' subtest, the
jitted code:
func #0:
0: f3 0f 1e fa endbr64
4: 0f 1f 44 00 00 nopl (%rax,%rax)
9: 0f 1f 00 nopl (%rax)
c: 55 pushq %rbp
d: 48 89 e5 movq %rsp, %rbp
10: f3 0f 1e fa endbr64
14: 49 b9 58 74 8a 8f 7d 60 00 00 movabsq $0x607d8f8a7458, %r9
1e: 65 4c 03 0c 25 28 c0 48 87 addq %gs:-0x78b73fd8, %r9
27: bf 2a 00 00 00 movl $0x2a, %edi
2c: 49 89 b9 00 ff ff ff movq %rdi, -0x100(%r9)
33: 31 c0 xorl %eax, %eax
35: c9 leave
36: e9 20 5d 0f e1 jmp 0xffffffffe10f5d5b
The insn 'addq %gs:-0x78b73fd8, %r9' does not match the expected
regex 'addq %gs:0x{{.*}}, %r9' and this caused test failure.
Fix it by changing '%gs:0x{{.*}}' to '%gs:{{.*}}' to accommodate the
possible negative offset. A few other subtests are fixed in a similar way.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250331033828.365077-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit [1] moves percpu data on x86 from address 0x000... to address
0xfff...
Before [1]:
159020: 0000000000030700 0 OBJECT GLOBAL DEFAULT 23 pcpu_hot
After [1]:
152602: ffffffff83a3e034 4 OBJECT GLOBAL DEFAULT 35 pcpu_hot
As a result, verifier_bpf_fastcall tests should now expect a negative
value for pcpu_hot, IOW, the disassemble should show "r=" instead of
"w=".
Fix this in the test.
Note that, a later change created a new variable "cpu_number" for
bpf_get_smp_processor_id() [2]. The inlining logic is updated properly
as part of this change, so there is no need to fix anything on the
kernel side.
[1] commit 9d7de2aa8b41 ("x86/percpu/64: Use relative percpu offsets")
[2] commit 01c7bc5198e9 ("x86/smp: Move cpu number to percpu hot section")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250328193124.808784-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The change in struct file [1] moved f_ref to the 3rd cache line.
It made *(u64 *)file dereference invalid from the verifier point of view,
because btf_struct_walk() walks into f_lock field, which is 4-byte long.
Fix the selftests to deference the file pointer as a 4-byte access.
[1] commit e249056c91a2 ("fs: place f_ref to 3rd cache line in struct file to resolve false sharing")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250327185528.1740787-1-song@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:
- Add support for Global Persistent Flush (GPF)
- Cleanup of DPA partition metadata handling:
- Remove the CXL_DECODER_MIXED enum that's not needed anymore
- Introduce helpers to access resource and perf meta data
- Introduce 'struct cxl_dpa_partition' and 'struct cxl_range_info'
- Make cxl_dpa_alloc() DPA partition number agnostic
- Remove cxl_decoder_mode
- Cleanup partition size and perf helpers
- Remove unused CXL partition values
- Add logging support for CXL CPER endpoint and port protocol errors:
- Prefix protocol error struct and function names with cxl_
- Move protocol error definitions and structures to a common location
- Remove drivers/firmware/efi/cper_cxl.h to include/linux/cper.h
- Add support in GHES to process CXL CPER protocol errors
- Process CXL CPER protocol errors
- Add trace logging for CXL PCIe port RAS errors
- Remove redundant gp_port init
- Add validation of cxl device serial number
- CXL ABI documentation updates/fixups
- A series that uses guard() to clean up open coded mutex lockings and
remove gotos for error handling.
- Some followup patches to support dirty shutdown accounting:
- Add helper to retrieve DVSEC offset for dirty shutdown registers
- Rename cxl_get_dirty_shutdown() to cxl_arm_dirty_shutdown()
- Add support for dirty shutdown count via sysfs
- cxl_test support for dirty shutdown
- A series to support CXL mailbox Features commands.
Mostly in preparation for CXL EDAC code to utilize the Features
commands. It's also in preparation for CXL fwctl support to utilize
the CXL Features. The commands include "Get Supported Features", "Get
Feature", and "Set Feature".
- A series to support extended linear cache support described by the
ACPI HMAT table.
The addition helps enumerate the cache and also provides additional
RAS reporting support for configuration with extended linear cache.
(and related fixes for the series).
- An update to cxl_test to support a 3-way capable CFMWS
- A documentation fix to remove unused "mixed mode"
* tag 'cxl-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (39 commits)
cxl/region: Fix the first aliased address miscalculation
cxl/region: Quiet some dev_warn()s in extended linear cache setup
cxl/Documentation: Remove 'mixed' from sysfs mode doc
cxl: Fix warning from emitting resource_size_t as long long int on 32bit systems
cxl/test: Define a CFMWS capable of a 3 way HB interleave
cxl/mem: Do not return error if CONFIG_CXL_MCE unset
tools/testing/cxl: Set Shutdown State support
cxl/pmem: Export dirty shutdown count via sysfs
cxl/pmem: Rename cxl_dirty_shutdown_state()
cxl/pci: Introduce cxl_gpf_get_dvsec()
cxl/pci: Support Global Persistent Flush (GPF)
cxl: Document missing sysfs files
cxl: Plug typos in ABI doc
cxl/pmem: debug invalid serial number data
cxl/cdat: Remove redundant gp_port initialization
cxl/memdev: Remove unused partition values
cxl/region: Drop goto pattern of construct_region()
cxl/region: Drop goto pattern in cxl_dax_region_alloc()
cxl/core: Use guard() to drop goto pattern of cxl_dpa_alloc()
cxl/core: Use guard() to drop the goto pattern of cxl_dpa_free()
...
|
|
In iproute 6.14, the nat ip mask logic was fixed to remove an undefined
behaviour[1]. So now instead of reporting '0.0.0.0/32' on x86 and potentially
'0.0.0.0/0' in other platforms, it reports '0.0.0.0/0' in all platforms.
[1] https://lore.kernel.org/netdev/20250306112520.188728-1-torben.nielsen@prevas.dk/
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Link: https://patch.msgid.link/20250401144908.568140-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Add a new maintainer for configfs
- Fix exportfs module description
- Place flexible array memeber at the end of an internal struct in the
mount code
- Add new maintainer for netfslib as Jeff Layton is stepping down as
current co-maintainer
- Fix error handling in cachefiles_get_directory()
- Cleanup do_notify_pidfd()
- Fix syscall number definitions in pidfd selftests
- Fix racy usage of fs_struct->in exec during multi-threaded exec
- Ensure correct exit code is reported when pidfs_exit() is called from
release_task() for a delayed thread-group leader exit
- Fix conflicting iomap flag definitions
* tag 'vfs-6.15-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: Fix conflicting values of iomap flags
fs: namespace: Avoid -Wflex-array-member-not-at-end warning
MAINTAINERS: configfs: add Andreas Hindborg as maintainer
exportfs: add module description
exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
netfs: add Paulo as maintainer and remove myself as Reviewer
cachefiles: Fix oops in vfs_mkdir from cachefiles_get_directory
exec: fix the racy usage of fs_struct->in_exec
selftests/pidfd: fixes syscall number defines
pidfs: cleanup the usage of do_notify_pidfd()
|
|
Add a test case to validate the interaction between TBF and SKBPRIO queueing
disciplines, specifically targeting queue length accounting corner cases.
This test complements the fix for the queue length accounting issue in the
SKBPRIO qdisc. This is still best-effort, as timing and manipulating enqueue
and dequeue from user-space is very hard.
Cc: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://patch.msgid.link/20250329222536.696204-3-xiyou.wangcong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Synchronize with https://github.com/sched-ext/scx at dc44584874f0 ("kernel:
Synchronize with kernel tools/sched_ext").
- READ/WRITE_ONCE() is made more proper and READA_ONCE_ARENA() is dropped.
- scale_by_task_weight[_inverse]() helpers added.
- Enum defs expanded to cover more and new enums.
- Don't trigger fatal error when some enums are missing from kernel BTF.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"These are objtool fixes and updates by Josh Poimboeuf, centered around
the fallout from the new CONFIG_OBJTOOL_WERROR=y feature, which,
despite its default-off nature, increased the profile/impact of
objtool warnings:
- Improve error handling and the presentation of warnings/errors
- Revert the new summary warning line that some test-bot tools
interpreted as new regressions
- Fix a number of objtool warnings in various drivers, core kernel
code and architecture code. About half of them are potential
problems related to out-of-bounds accesses or potential undefined
behavior, the other half are additional objtool annotations
- Update objtool to latest (known) compiler quirks and objtool bugs
triggered by compiler code generation
- Misc fixes"
* tag 'objtool-urgent-2025-04-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
objtool/loongarch: Add unwind hints in prepare_frametrace()
rcu-tasks: Always inline rcu_irq_work_resched()
context_tracking: Always inline ct_{nmi,irq}_{enter,exit}()
sched/smt: Always inline sched_smt_active()
objtool: Fix verbose disassembly if CROSS_COMPILE isn't set
objtool: Change "warning:" to "error: " for fatal errors
objtool: Always fail on fatal errors
Revert "objtool: Increase per-function WARN_FUNC() rate limit"
objtool: Append "()" to function name in "unexpected end of section" warning
objtool: Ignore end-of-section jumps for KCOV/GCOV
objtool: Silence more KCOV warnings, part 2
objtool, drm/vmwgfx: Don't ignore vmw_send_msg() for ORC
objtool: Fix STACK_FRAME_NON_STANDARD for cold subfunctions
objtool: Fix segfault in ignore_unreachable_insn()
objtool: Fix NULL printf() '%s' argument in builtin-check.c:save_argv()
objtool, lkdtm: Obfuscate the do_nothing() pointer
objtool, regulator: rk808: Remove potential undefined behavior in rk806_set_mode_dcdc()
objtool, ASoC: codecs: wcd934x: Remove potential undefined behavior in wcd934x_slim_irq_handler()
objtool, Input: cyapa - Remove undefined behavior in cyapa_update_fw_store()
objtool, panic: Disable SMAP in __stack_chk_fail()
...
|
|
Use io_uring vectored fixed kernel buffer for handling stripe IO.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250325135155.935398-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Rather tiny pull request, mostly so that we can get into our trees
your fix to the x86 Makefile.
Current release - regressions:
- Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
queue accounting was missed
Current release - new code bugs:
- 5 fixes for the netdevice instance locking work
Previous releases - regressions:
- usbnet: restore usb%d name exception for local mac addresses
Previous releases - always broken:
- rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
spurious GET_LINK failures
- eth: mana: Switch to page pool for jumbo frames
- phy: broadcom: Correct BCM5221 PHY model detection
Misc:
- selftests: drv-net: replace helpers for referring to other files"
* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
bnxt_en: bring back rtnl lock in bnxt_shutdown
eth: gve: add missing netdev locks on reset and shutdown paths
selftests: mptcp: ignore mptcp_diag binary
selftests: mptcp: close fd_in before returning in main_loop
selftests: mptcp: fix incorrect fd checks in main_loop
mptcp: fix NULL pointer in can_accept_new_subflow
octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
octeontx2-af: Fix mbox INTR handler when num VFs > 64
net: fix use-after-free in the netdev_nl_sock_priv_destroy()
selftests: net: use Path helpers in ping
selftests: net: use the dummy bpf from net/lib
selftests: drv-net: replace the rpath helper with Path objects
net: lapbether: use netdev_lockdep_set_classes() helper
net: phy: broadcom: Correct BCM5221 PHY model detection
net: usb: usbnet: restore usb%d name exception for local mac addresses
net/mlx5e: SHAMPO, Make reserved size independent of page size
net: mana: Switch to page pool for jumbo frames
MAINTAINERS: Add dedicated entries for phy_link_topology
net: move replay logic to tc_modify_qdisc
...
|
|
Pull virtio updates from Michael Tsirkin:
"A small number of improvements all over the place:
- shutdown has been reworked to reset devices
- virtio fs is now allowed in vduse
- vhost-scsi memory use has been reduced
- cleanups, fixes all over the place
A couple more fixes are being tested and will be merged after rc1"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-scsi: Reduce response iov mem use
vhost-scsi: Allocate iov_iter used for unaligned copies when needed
vhost-scsi: Stop duplicating se_cmd fields
vhost-scsi: Dynamically allocate scatterlists
vhost-scsi: Return queue full for page alloc failures during copy
vhost-scsi: Add better resource allocation failure handling
vhost-scsi: Allocate T10 PI structs only when enabled
vhost-scsi: Reduce mem use by moving upages to per queue
vduse: add virtio_fs to allowed dev id
sound/virtio: Fix cancel_sync warnings on uninitialized work_structs
vdpa/mlx5: Fix oversized null mkey longer than 32bit
vdpa/mlx5: Fix mlx5_vdpa_get_config() endianness on big-endian machines
vhost-scsi: Fix handling of multiple calls to vhost_scsi_set_endpoint
tools: virtio/linux/module.h add MODULE_DESCRIPTION() define.
tools: virtio/linux/compiler.h: Add data_race() define.
tools/virtio: Add DMA_MAPPING_ERROR and sg_dma_len api define for virtio test
virtio: break and reset virtio devices on device_shutdown()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Two significant new items:
- Allow reporting IOMMU HW events to userspace when the events are
clearly linked to a device.
This is linked to the VIOMMU object and is intended to be used by a
VMM to forward HW events to the virtual machine as part of
emulating a vIOMMU. ARM SMMUv3 is the first driver to use this
mechanism. Like the existing fault events the data is delivered
through a simple FD returning event records on read().
- PASID support in VFIO.
The "Process Address Space ID" is a PCI feature that allows the
device to tag all PCI DMA operations with an ID. The IOMMU will
then use the ID to select a unique translation for those DMAs. This
is part of Intel's vIOMMU support as VT-D HW requires the
hypervisor to manage each PASID entry.
The support is generic so any VFIO user could attach any
translation to a PASID, and the support should work on ARM SMMUv3
as well. AMD requires additional driver work.
Some minor updates, along with fixes:
- Prevent using nested parents with fault's, no driver support today
- Put a single "cookie_type" value in the iommu_domain to indicate
what owns the various opaque owner fields"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (49 commits)
iommufd: Test attach before detaching pasid
iommufd: Fix iommu_vevent_header tables markup
iommu: Convert unreachable() to BUG()
iommufd: Balance veventq->num_events inc/dec
iommufd: Initialize the flags of vevent in iommufd_viommu_report_event()
iommufd/selftest: Add coverage for reporting max_pasid_log2 via IOMMU_HW_INFO
iommufd: Extend IOMMU_GET_HW_INFO to report PASID capability
vfio: VFIO_DEVICE_[AT|DE]TACH_IOMMUFD_PT support pasid
vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices
ida: Add ida_find_first_range()
iommufd/selftest: Add coverage for iommufd pasid attach/detach
iommufd/selftest: Add test ops to test pasid attach/detach
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add set_dev_pasid in mock iommu
iommufd: Allow allocating PASID-compatible domain
iommu/vt-d: Add IOMMU_HWPT_ALLOC_PASID support
iommufd: Enforce PASID-compatible domain for RID
iommufd: Support pasid attach/replace
iommufd: Enforce PASID-compatible domain in PASID path
iommufd/device: Add pasid_attach array to track per-PASID attach
...
|
|
Add sysmap_is_sealed.c to test system mappings are sealed.
Note: CONFIG_MSEAL_SYSTEM_MAPPINGS must be set, as indicated in
config file.
Link: https://lkml.kernel.org/r/20250305021711.3867874-8-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add code to detect if the vdso is memory sealed, skip the test if it is.
Link: https://lkml.kernel.org/r/20250305021711.3867874-3-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add PPC64 Radix MMU support to the va_high_addr_switch.sh by introducing
check_supported_ppc64(). The function verifies:
- 5-level paging (PGTABLE_LEVELS >= 5) enable in kernel config
- Radix MMU (required for PPC64 5-level translation)
- HugePages availability (needed for some tests)
If any check fails, the test is skipped (ksft_skip). This ensures
compatibility with Power9/Power10 systems running in Radix MMU mode.
Avoid failures on 4-level paging system:
# mmap(NULL, MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(LOW_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(HIGH_ADDR, MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(HIGH_ADDR, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
# mmap(HIGH_ADDR, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(-1, MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(-1, MAP_HUGETLB) again: 0xffffffffffffffff - FAILED
# mmap(ADDR_SWITCH_HINT - PAGE_SIZE, 2*HUGETLB_SIZE, MAP_HUGETLB): 0xffffffffffffffff - FAILED
# mmap(ADDR_SWITCH_HINT , 2*HUGETLB_SIZE, MAP_FIXED | MAP_HUGETLB): 0xffffffffffffffff - FAILED
Link: https://lkml.kernel.org/r/20250327114813.25980-1-liwang@redhat.com
Signed-off-by: Li Wang <liwang@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When doing io_uring operations using liburing, errno is not used to
indicate errors, so the %m format specifier does not provide any
relevant information for failed io_uring commands. Fix a log line
emitted on get_params failure to translate the error code returned in
the cqe->res field instead.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250401-ublk_selftests-v1-2-98129c9bc8bb@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There are a couple of places in the kublk selftests ublk server which
use the legacy ublk opcodes. These operations fail (with -EOPNOTSUPP) on
a kernel compiled without CONFIG_BLKDEV_UBLK_LEGACY_OPCODES set. We
could easily require it to be set as a prerequisite for these selftests,
but since new applications should not be using the legacy opcodes, use
the ioctl-encoded opcodes everywhere in kublk.
Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20250401-ublk_selftests-v1-1-98129c9bc8bb@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO driver updates from Greg KH:
"Here is the big set of char, misc, iio, and other smaller driver
subsystems for 6.15-rc1. Lots of stuff in here, including:
- loads of IIO changes and driver updates
- counter driver updates
- w1 driver updates
- faux conversions for some drivers that were abusing the platform
bus interface
- coresight driver updates
- rust miscdevice binding updates based on real-world-use
- other minor driver updates
All of these have been in linux-next with no reported issues for quite
a while"
* tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
samples: rust_misc_device: fix markup in top-level docs
Coresight: Fix a NULL vs IS_ERR() bug in probe
misc: lis3lv02d: convert to use faux_device
tlclk: convert to use faux_device
regulator: dummy: convert to use the faux device interface
bus: mhi: host: Fix race between unprepare and queue_buf
coresight: configfs: Constify struct config_item_type
doc: iio: ad7380: describe offload support
iio: ad7380: add support for SPI offload
iio: light: Add check for array bounds in veml6075_read_int_time_ms
iio: adc: ti-ads7924 Drop unnecessary function parameters
staging: iio: ad9834: Use devm_regulator_get_enable()
staging: iio: ad9832: Use devm_regulator_get_enable()
iio: gyro: bmg160_spi: add of_match_table
dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
iio: adc: ad7768-1: remove unnecessary locking
Documentation: ABI: add wideband filter type to sysfs-bus-iio
iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
iio: adc: ad7768-1: Fix conversion result sign
iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- The series "powerpc/crash: use generic crashkernel reservation" from
Sourabh Jain changes powerpc's kexec code to use more of the generic
layers.
- The series "get_maintainer: report subsystem status separately" from
Vlastimil Babka makes some long-requested improvements to the
get_maintainer output.
- The series "ucount: Simplify refcounting with rcuref_t" from
Sebastian Siewior cleans up and optimizing the refcounting in the
ucount code.
- The series "reboot: support runtime configuration of emergency
hw_protection action" from Ahmad Fatoum improves the ability for a
driver to perform an emergency system shutdown or reboot.
- The series "Converge on using secs_to_jiffies() part two" from Easwar
Hariharan performs further migrations from msecs_to_jiffies() to
secs_to_jiffies().
- The series "lib/interval_tree: add some test cases and cleanup" from
Wei Yang permits more userspace testing of kernel library code, adds
some more tests and performs some cleanups.
- The series "hung_task: Dump the blocking task stacktrace" from Masami
Hiramatsu arranges for the hung_task detector to dump the stack of
the blocking task and not just that of the blocked task.
- The series "resource: Split and use DEFINE_RES*() macros" from Andy
Shevchenko provides some cleanups to the resource definition macros.
- Plus the usual shower of singleton patches - please see the
individual changelogs for details.
* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
mailmap: consolidate email addresses of Alexander Sverdlin
fs/procfs: fix the comment above proc_pid_wchan()
relay: use kasprintf() instead of fixed buffer formatting
resource: replace open coded variant of DEFINE_RES()
resource: replace open coded variants of DEFINE_RES_*_NAMED()
resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
samples: add hung_task detector mutex blocking sample
hung_task: show the blocker task if the task is hung on mutex
kexec_core: accept unaccepted kexec segments' destination addresses
watchdog/perf: optimize bytes copied and remove manual NUL-termination
lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
lib/interval_tree: skip the check before go to the right subtree
lib/interval_tree: add test case for span iteration
lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
lib/rbtree: add random seed
lib/rbtree: split tests
lib/rbtree: enable userland test suite for rbtree related data structure
checkpatch: describe --min-conf-desc-length
scripts/gdb/symbols: determine KASLR offset on s390
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "Enable strict percpu address space checks" from Uros
Bizjak uses x86 named address space qualifiers to provide
compile-time checking of percpu area accesses.
This has caused a small amount of fallout - two or three issues were
reported. In all cases the calling code was found to be incorrect.
- The series "Some cleanup for memcg" from Chen Ridong implements some
relatively monir cleanups for the memcontrol code.
- The series "mm: fixes for device-exclusive entries (hmm)" from David
Hildenbrand fixes a boatload of issues which David found then using
device-exclusive PTE entries when THP is enabled. More work is
needed, but this makes thins better - our own HMM selftests now
succeed.
- The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed
remove the z3fold and zbud implementations. They have been deprecated
for half a year and nobody has complained.
- The series "mm: further simplify VMA merge operation" from Lorenzo
Stoakes implements numerous simplifications in this area. No runtime
effects are anticipated.
- The series "mm/madvise: remove redundant mmap_lock operations from
process_madvise()" from SeongJae Park rationalizes the locking in the
madvise() implementation. Performance gains of 20-25% were observed
in one MADV_DONTNEED microbenchmark.
- The series "Tiny cleanup and improvements about SWAP code" from
Baoquan He contains a number of touchups to issues which Baoquan
noticed when working on the swap code.
- The series "mm: kmemleak: Usability improvements" from Catalin
Marinas implements a couple of improvements to the kmemleak
user-visible output.
- The series "mm/damon/paddr: fix large folios access and schemes
handling" from Usama Arif provides a couple of fixes for DAMON's
handling of large folios.
- The series "mm/damon/core: fix wrong and/or useless damos_walk()
behaviors" from SeongJae Park fixes a few issues with the accuracy of
kdamond's walking of DAMON regions.
- The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo
Stoakes changes the interaction between framebuffer deferred-io and
core MM. No functional changes are anticipated - this is preparatory
work for the future removal of page structure fields.
- The series "mm/damon: add support for hugepage_size DAMOS filter"
from Usama Arif adds a DAMOS filter which permits the filtering by
huge page sizes.
- The series "mm: permit guard regions for file-backed/shmem mappings"
from Lorenzo Stoakes extends the guard region feature from its
present "anon mappings only" state. The feature now covers shmem and
file-backed mappings.
- The series "mm: batched unmap lazyfree large folios during
reclamation" from Barry Song cleans up and speeds up the unmapping
for pte-mapped large folios.
- The series "reimplement per-vma lock as a refcount" from Suren
Baghdasaryan puts the vm_lock back into the vma. Our reasons for
pulling it out were largely bogus and that change made the code more
messy. This patchset provides small (0-10%) improvements on one
microbenchmark.
- The series "Docs/mm/damon: misc DAMOS filters documentation fixes and
improves" from SeongJae Park does some maintenance work on the DAMON
docs.
- The series "hugetlb/CMA improvements for large systems" from Frank
van der Linden addresses a pile of issues which have been observed
when using CMA on large machines.
- The series "mm/damon: introduce DAMOS filter type for unmapped pages"
from SeongJae Park enables users of DMAON/DAMOS to filter my the
page's mapped/unmapped status.
- The series "zsmalloc/zram: there be preemption" from Sergey
Senozhatsky teaches zram to run its compression and decompression
operations preemptibly.
- The series "selftests/mm: Some cleanups from trying to run them" from
Brendan Jackman fixes a pile of unrelated issues which Brendan
encountered while runnimg our selftests.
- The series "fs/proc/task_mmu: add guard region bit to pagemap" from
Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to
determine whether a particular page is a guard page.
- The series "mm, swap: remove swap slot cache" from Kairui Song
removes the swap slot cache from the allocation path - it simply
wasn't being effective.
- The series "mm: cleanups for device-exclusive entries (hmm)" from
David Hildenbrand implements a number of unrelated cleanups in this
code.
- The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual
implements a number of preparatoty cleanups to the GENERIC_PTDUMP
Kconfig logic.
- The series "mm/damon: auto-tune aggregation interval" from SeongJae
Park implements a feedback-driven automatic tuning feature for
DAMON's aggregation interval tuning.
- The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in
powerpc, sparc and x86 lazy MMU implementations. Ryan did this in
preparation for implementing lazy mmu mode for arm64 to optimize
vmalloc.
- The series "mm/page_alloc: Some clarifications for migratetype
fallback" from Brendan Jackman reworks some commentary to make the
code easier to follow.
- The series "page_counter cleanup and size reduction" from Shakeel
Butt cleans up the page_counter code and fixes a size increase which
we accidentally added late last year.
- The series "Add a command line option that enables control of how
many threads should be used to allocate huge pages" from Thomas
Prescher does that. It allows the careful operator to significantly
reduce boot time by tuning the parallalization of huge page
initialization.
- The series "Fix calculations in trace_balance_dirty_pages() for cgwb"
from Tang Yizhou fixes the tracing output from the dirty page
balancing code.
- The series "mm/damon: make allow filters after reject filters useful
and intuitive" from SeongJae Park improves the handling of allow and
reject filters. Behaviour is made more consistent and the documention
is updated accordingly.
- The series "Switch zswap to object read/write APIs" from Yosry Ahmed
updates zswap to the new object read/write APIs and thus permits the
removal of some legacy code from zpool and zsmalloc.
- The series "Some trivial cleanups for shmem" from Baolin Wang does as
it claims.
- The series "fs/dax: Fix ZONE_DEVICE page reference counts" from
Alistair Popple regularizes the weird ZONE_DEVICE page refcount
handling in DAX, permittig the removal of a number of special-case
checks.
- The series "refactor mremap and fix bug" from Lorenzo Stoakes is a
preparatoty refactoring and cleanup of the mremap() code.
- The series "mm: MM owner tracking for large folios (!hugetlb) +
CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in
which we determine whether a large folio is known to be mapped
exclusively into a single MM.
- The series "mm/damon: add sysfs dirs for managing DAMOS filters based
on handling layers" from SeongJae Park adds a couple of new sysfs
directories to ease the management of DAMON/DAMOS filters.
- The series "arch, mm: reduce code duplication in mem_init()" from
Mike Rapoport consolidates many per-arch implementations of
mem_init() into code generic code, where that is practical.
- The series "mm/damon/sysfs: commit parameters online via
damon_call()" from SeongJae Park continues the cleaning up of sysfs
access to DAMON internal data.
- The series "mm: page_ext: Introduce new iteration API" from Luiz
Capitulino reworks the page_ext initialization to fix a boot-time
crash which was observed with an unusual combination of compile and
cmdline options.
- The series "Buddy allocator like (or non-uniform) folio split" from
Zi Yan reworks the code to split a folio into smaller folios. The
main benefit is lessened memory consumption: fewer post-split folios
are generated.
- The series "Minimize xa_node allocation during xarry split" from Zi
Yan reduces the number of xarray xa_nodes which are generated during
an xarray split.
- The series "drivers/base/memory: Two cleanups" from Gavin Shan
performs some maintenance work on the drivers/base/memory code.
- The series "Add tracepoints for lowmem reserves, watermarks and
totalreserve_pages" from Martin Liu adds some more tracepoints to the
page allocator code.
- The series "mm/madvise: cleanup requests validations and
classifications" from SeongJae Park cleans up some warts which
SeongJae observed during his earlier madvise work.
- The series "mm/hwpoison: Fix regressions in memory failure handling"
from Shuai Xue addresses two quite serious regressions which Shuai
has observed in the memory-failure implementation.
- The series "mm: reliable huge page allocator" from Johannes Weiner
makes huge page allocations cheaper and more reliable by reducing
fragmentation.
- The series "Minor memcg cleanups & prep for memdescs" from Matthew
Wilcox is preparatory work for the future implementation of memdescs.
- The series "track memory used by balloon drivers" from Nico Pache
introduces a way to track memory used by our various balloon drivers.
- The series "mm/damon: introduce DAMOS filter type for active pages"
from Nhat Pham permits users to filter for active/inactive pages,
separately for file and anon pages.
- The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia
separates the proactive reclaim statistics from the direct reclaim
statistics.
- The series "mm/vmscan: don't try to reclaim hwpoison folio" from
Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim
code.
* tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits)
mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex()
x86/mm: restore early initialization of high_memory for 32-bits
mm/vmscan: don't try to reclaim hwpoison folio
mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper
cgroup: docs: add pswpin and pswpout items in cgroup v2 doc
mm: vmscan: split proactive reclaim statistics from direct reclaim statistics
selftests/mm: speed up split_huge_page_test
selftests/mm: uffd-unit-tests support for hugepages > 2M
docs/mm/damon/design: document active DAMOS filter type
mm/damon: implement a new DAMOS filter type for active pages
fs/dax: don't disassociate zero page entries
MM documentation: add "Unaccepted" meminfo entry
selftests/mm: add commentary about 9pfs bugs
fork: use __vmalloc_node() for stack allocation
docs/mm: Physical Memory: Populate the "Zones" section
xen: balloon: update the NR_BALLOON_PAGES state
hv_balloon: update the NR_BALLOON_PAGES state
balloon_compaction: update the NR_BALLOON_PAGES state
meminfo: add a per node counter for balloon drivers
mm: remove references to folio in __memcg_kmem_uncharge_page()
...
|
|
The tool is not embedded in the testing framework. 'rtc' from rtc-tools
serves as a much better programming example. No need to carry this tool
in the kernel tree.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20250320103433.11673-1-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
There are alarms which have only minute-granularity. The RTC core
already has a flag to describe them. Use this flag to skip tests which
require the alarm to support seconds.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Link: https://lore.kernel.org/r/20250218101548.6514-1-wsa+renesas@sang-engineering.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
In verbose mode, when printing the disassembly of affected functions, if
CROSS_COMPILE isn't set, the objdump command string gets prefixed with
"(null)".
Somehow this worked before. Maybe some versions of glibc return an
empty string instead of NULL. Fix it regardless.
[ jpoimboe: Rewrite commit log. ]
Fixes: ca653464dd097 ("objtool: Add verbose option for disassembling affected functions")
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20250215142321.14081-1-david.laight.linux@gmail.com
Link: https://lore.kernel.org/r/b931a4786bc0127aa4c94e8b35ed617dcbd3d3da.1743481539.git.jpoimboe@kernel.org
|
|
This is similar to GCC's behavior and makes it more obvious why the
build failed.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/0ea76f4b0e7a370711ed9f75fd0792bb5979c2bf.1743481539.git.jpoimboe@kernel.org
|
|
Objtool writes several object annotations which are used to enable
critical kernel runtime functionalities like static calls and
retpoline/rethunk patching.
In the rare case where it fails to read or write an object, the
annotations don't get written, causing runtime code patching to fail and
code to become corrupted.
Due to the catastrophic nature of such warnings, convert them to errors
which fail the build regardless of CONFIG_OBJTOOL_WERROR.
Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/7d35684ca61eac56eb2424f300ca43c5d257b170.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/SJ1PR11MB61295789E25C2F5197EFF2F6B9A72@SJ1PR11MB6129.namprd11.prod.outlook.com
|
|
This reverts commit 0a7fb6f07e3ad497d31ae9a2082d2cacab43d54a.
The "skipping duplicate warnings" warning is technically not an actual
warning, which can cause confusion. This feature isn't all that useful
anyway. It's exceedingly rare for a function to have more than one
unrelated warning.
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/e5abe5e858acf1a9207a5dfa0f37d17ac9dca872.1743481539.git.jpoimboe@kernel.org
|
|
Append with "()" to clarify it's a function.
Before:
vmlinux.o: warning: objtool: cdns_mrvl_xspi_setup_clock: unexpected end of section .text.cdns_mrvl_xspi_setup_clock
After:
vmlinux.o: warning: objtool: cdns_mrvl_xspi_setup_clock(): unexpected end of section .text.cdns_mrvl_xspi_setup_clock
Fixes: c5995abe1547 ("objtool: Improve error handling")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/692e1e0d0b15a71bd35c6b4b87f3c75cd5a57358.1743481539.git.jpoimboe@kernel.org
|
|
When KCOV or GCOV is enabled, dead code can be left behind, in which
case objtool silences unreachable and undefined behavior (fallthrough)
warnings.
Fallthrough warnings, and their variant "end of section" warnings, were
silenced with the following commit:
6b023c784204 ("objtool: Silence more KCOV warnings")
Another variant of a fallthrough warning is a jump to the end of a
function. If that function happens to be at the end of a section, the
jump destination doesn't actually exist.
Normally that would be a fatal objtool error, but for KCOV/GCOV it's
just another undefined behavior fallthrough. Silence it like the
others.
Fixes the following warning:
drivers/iommu/dma-iommu.o: warning: objtool: iommu_dma_sw_msi+0x92: can't find jump dest instruction at .text+0x54d5
Fixes: 6b023c784204 ("objtool: Silence more KCOV warnings")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/08fbe7d7e1e20612206f1df253077b94f178d93e.1743481539.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/314f8809-cd59-479b-97d7-49356bf1c8d1@infradead.org/
|
|
A new binary is now generated by the MPTCP selftests: mptcp_diag.
Like the other binaries from this directory, there is no need to track
this in Git, it should then be ignored.
Fixes: 00f5e338cf7e ("selftests: mptcp: Add a tool to get specific msk_info")
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-4-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The file descriptor 'fd_in' is opened when cfg_input is configured, but
not closed in main_loop(), this patch fixes it.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Co-developed-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-3-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fix a bug where the code was checking the wrong file descriptors
when opening the input files. The code was checking 'fd' instead
of 'fd_in', which could lead to incorrect error handling.
Fixes: 05be5e273c84 ("selftests: mptcp: add disconnect tests")
Cc: stable@vger.kernel.org
Fixes: ca7ae8916043 ("selftests: mptcp: mptfo Initiator/Listener")
Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Cong Liu <liucong2@kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250328-net-mptcp-misc-fixes-6-15-v1-2-34161a482a7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that net and net-next have converged we can use the Path
helpers in the ping test without conflicts.
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-4-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and
the ioctl path") added an sample XDP_PASS prog in net/lib, so
that we can reuse it in various sub-directories. Delete the old
sample and use the one from the lib in existing tests.
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The single letter + "path" helpers do not have many fans (see Link).
Use a Path object with a better name. test_dir is the replacement
for rpath(), net_lib_dir is a new path of the $ksft/net/lib directory.
The Path() class overloads the "/" operator and can be cast to string
automatically, so to get a path to a file tests can do:
path = env.test_dir / "binary"
Link: https://lore.kernel.org/CA+FuTSemTNVZ5MxXkq8T9P=DYm=nSXcJnL7CJBPZNAT_9UFisQ@mail.gmail.com
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250327222315.1098596-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Introduce latency profiling using scheduler information.
The latency profiling is to show impacts on wall-time rather than
cpu-time. By tracking context switches, it can weight samples and
find which part of the code contributed more to the execution
latency.
The value (period) of the sample is weighted by dividing it by the
number of parallel execution at the moment. The parallelism is
tracked in perf report with sched-switch records. This will reduce
the portion that are run in parallel and in turn increase the
portion of serial executions.
For now, it's limited to profile processes, IOW system-wide
profiling is not supported. You can add --latency option to enable
this.
$ perf record --latency -- make -C tools/perf
I've run the above command for perf build which adds -j option to
make with the number of CPUs in the system internally. Normally
it'd show something like below:
$ perf report -F overhead,comm
...
#
# Overhead Command
# ........ ...............
#
78.97% cc1
6.54% python3
4.21% shellcheck
3.28% ld
1.80% as
1.37% cc1plus
0.80% sh
0.62% clang
0.56% gcc
0.44% perl
0.39% make
...
The cc1 takes around 80% of the overhead as it's the actual
compiler. However it runs in parallel so its contribution to
latency may be less than that. Now, perf report will show both
overhead and latency (if --latency was given at record time) like
below:
$ perf report -s comm
...
#
# Overhead Latency Command
# ........ ........ ...............
#
78.97% 48.66% cc1
6.54% 25.68% python3
4.21% 0.39% shellcheck
3.28% 13.70% ld
1.80% 2.56% as
1.37% 3.08% cc1plus
0.80% 0.98% sh
0.62% 0.61% clang
0.56% 0.33% gcc
0.44% 1.71% perl
0.39% 0.83% make
...
You can see latency of cc1 goes down to around 50% and python3 and
ld contribute a lot more than their overhead. You can use --latency
option in perf report to get the same result but ordered by
latency.
$ perf report --latency -s comm
perf report:
- As a side effect of the latency profiling work, it adds a new
output field 'latency' and a sort key 'parallelism'. The below is a
result from my system with 64 CPUs. The build was well-parallelized
but contained some serial portions.
$ perf report -s parallelism
...
#
# Overhead Latency Parallelism
# ........ ........ ...........
#
16.95% 1.54% 62
13.38% 1.24% 61
12.50% 70.47% 1
11.81% 1.06% 63
7.59% 0.71% 60
4.33% 12.20% 2
3.41% 0.33% 59
2.05% 0.18% 64
1.75% 1.09% 9
1.64% 1.85% 5
...
- Support Feodra mini-debuginfo which is a LZMA compressed symbol
table inside ".gnu_debugdata" ELF section.
perf annotate:
- Add --code-with-type option to enable data-type profiling with the
usual annotate output.
Instead of focusing on data structure, it shows code annotation
together with data type it accesses in case the instruction refers
to a memory location (and it was able to resolve the target data
type). Currently it only works with --stdio.
$ perf annotate --stdio --code-with-type
...
Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
----------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff81050610 <__fdget>:
0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation)
0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation)
0.00 : ffffffff81050616: movq %rsp, %rbp
0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation)
0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation)
0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation)
0.00 : ffffffff8105061e: subq $0x10, %rsp
0.00 : ffffffff81050622: movl %edi, %ebx
0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0
0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files)
0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter)
0.00 : ffffffff81050636: cmpl $0x1, %eax
0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99>
0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt)
0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds)
0.00 : ffffffff81050641: cmpl %ebx, %eax
0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf>
0.00 : ffffffff81050649: movl %ebx, %r15d
5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd)
...
The "# data-type:" part was added with this change. The first few
entries are not very interesting. But later you can it accesses a
couple of fields in the task_struct, files_struct and fdtable.
perf trace:
- Support syscall tracing for different ABI. For example it can trace
system calls for 32-bit applications on 64-bit kernel
transparently.
- Add --summary-mode=total option to show global syscall summary. The
default is 'thread' to show per-thread syscall summary.
Python support:
- Add more interfaces to 'perf' module to parse events, and config,
enable or disable the event list properly so that it can implement
basic functionalities purely in Python. There is an example code
for these new interfaces in python/tracepoint.py.
- Add mypy and pylint support to enable build time checking. Fix some
code based on the findings from these tools.
Internals:
- Introduce io_dir__readdir() API to make directory traveral (usually
for proc or sysfs) efficient with less memory footprint.
JSON vendor events:
- Add events and metrics for ARM Neoverse N3 and V3
- Update events and metrics on various Intel CPUs
- Add/update events for a number of SiFive processors"
* tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
perf bpf-filter: Fix a parsing error with comma
perf report: Fix a memory leak for perf_env on AMD
perf trace: Fix wrong size to bpf_map__update_elem call
perf tools: annotate asm_pure_loop.S
perf python: Fix setup.py mypy errors
perf test: Address attr.py mypy error
perf build: Add pylint build tests
perf build: Add mypy build tests
perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
tools/build: Don't pass test log files to linker
perf bench sched pipe: fix enforced blocking reads in worker_thread
perf tools: Fix is_compat_mode build break in ppc64
perf build: filter all combinations of -flto for libperl
perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
perf trace: Fix evlist memory leak
perf trace: Fix BTF memory leak
perf trace: Make syscall table stable
perf syscalltbl: Mask off ABI type for MIPS system calls
perf build: Remove Makefile.syscalls
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf relisient spinlock support from Alexei Starovoitov:
"This patch set introduces Resilient Queued Spin Lock (or rqspinlock
with res_spin_lock() and res_spin_unlock() APIs).
This is a qspinlock variant which recovers the kernel from a stalled
state when the lock acquisition path cannot make forward progress.
This can occur when a lock acquisition attempt enters a deadlock
situation (e.g. AA, or ABBA), or more generally, when the owner of the
lock (which we’re trying to acquire) isn’t making forward progress.
Deadlock detection is the main mechanism used to provide instant
recovery, with the timeout mechanism acting as a final line of
defense. Detection is triggered immediately when beginning the waiting
loop of a lock slow path.
Additionally, BPF programs attached to different parts of the kernel
can introduce new control flow into the kernel, which increases the
likelihood of deadlocks in code not written to handle reentrancy.
There have been multiple syzbot reports surfacing deadlocks in
internal kernel code due to the diverse ways in which BPF programs can
be attached to different parts of the kernel. By switching the BPF
subsystem’s lock usage to rqspinlock, all of these issues are
mitigated at runtime.
This spin lock implementation allows BPF maps to become safer and
remove mechanisms that have fallen short in assuring safety when
nesting programs in arbitrary ways in the same context or across
different contexts.
We run benchmarks that stress locking scalability and perform
comparison against the baseline (qspinlock). For the rqspinlock case,
we replace the default qspinlock with it in the kernel, such that all
spin locks in the kernel use the rqspinlock slow path. As such,
benchmarks that stress kernel spin locks end up exercising rqspinlock.
More details in the cover letter in commit 6ffb9017e932 ("Merge branch
'resilient-queued-spin-lock'")"
* tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits)
selftests/bpf: Add tests for rqspinlock
bpf: Maintain FIFO property for rqspinlock unlock
bpf: Implement verifier support for rqspinlock
bpf: Introduce rqspinlock kfuncs
bpf: Convert lpm_trie.c to rqspinlock
bpf: Convert percpu_freelist.c to rqspinlock
bpf: Convert hashtab.c to rqspinlock
rqspinlock: Add locktorture support
rqspinlock: Add entry to Makefile, MAINTAINERS
rqspinlock: Add macros for rqspinlock usage
rqspinlock: Add basic support for CONFIG_PARAVIRT
rqspinlock: Add a test-and-set fallback
rqspinlock: Add deadlock detection and recovery
rqspinlock: Protect waiters in trylock fallback from stalls
rqspinlock: Protect waiters in queue from stalls
rqspinlock: Protect pending bit owners from stalls
rqspinlock: Hardcode cond_acquire loops for arm64
rqspinlock: Add support for timeouts
rqspinlock: Drop PV and virtualization support
rqspinlock: Add rqspinlock.h header
...
|