Age | Commit message (Collapse) | Author |
|
syzbot reported a memory leak [0] related to IPV6_ADDRFORM.
The scenario is that while one thread is converting an IPv6 socket into
IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and
allocates memory to inet6_sk(sk)->XXX after conversion.
Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources,
which inet6_destroy_sock() should have cleaned up.
setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS)
+-----------------------+ +----------------------+
- do_ipv6_setsockopt(sk, ...)
- sockopt_lock_sock(sk) - do_ipv6_setsockopt(sk, ...)
- lock_sock(sk) ^._ called via tcpv6_prot
- WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE()
- xchg(&np->opt, NULL)
- txopt_put(opt)
- sockopt_release_sock(sk)
- release_sock(sk) - sockopt_lock_sock(sk)
- lock_sock(sk)
- ipv6_set_opt_hdr(sk, ...)
- ipv6_update_options(sk, opt)
- xchg(&inet6_sk(sk)->opt, opt)
^._ opt is never freed.
- sockopt_release_sock(sk)
- release_sock(sk)
Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this
memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after
acquiring the lock.
This issue exists from the initial commit between IPV6_ADDRFORM and
IPV6_PKTOPTIONS.
[0]:
BUG: memory leak
unreferenced object 0xffff888009ab9f80 (size 96):
comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s)
hex dump (first 32 bytes):
01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00 ....H...........
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline]
[<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566
[<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318
[<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline]
[<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668
[<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021
[<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789
[<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252
[<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline]
[<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline]
[<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260
[<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
[<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
[<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more KUnit updates from Shuah Khan:
"Features and fixes:
- simplify resource use
- make kunit_malloc() and kunit_free() allocations and frees
consistent. kunit_free() frees only the memory allocated by
kunit_malloc()
- stop downloading risc-v opensbi binaries using wget
- other fixes and improvements to tool and KUnit framework"
* tag 'linux-kselftest-kunit-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
Documentation: kunit: Update description of --alltests option
kunit: declare kunit_assert structs as const
kunit: rename base KUNIT_ASSERTION macro to _KUNIT_FAILED
kunit: remove format func from struct kunit_assert, get it to 0 bytes
kunit: tool: Don't download risc-v opensbi firmware with wget
kunit: make kunit_kfree(NULL) a no-op to match kfree()
kunit: make kunit_kfree() not segfault on invalid inputs
kunit: make kunit_kfree() only work on pointers from kunit_malloc() and friends
kunit: drop test pointer in string_stream_fragment
kunit: string-stream: Simplify resource use
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull more Kselftest updates from Shuah Khan:
"This consists of fixes and improvements to memory-hotplug test and a
minor spelling fix to ftrace test"
* tag 'linux-kselftest-next-6.1-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
docs: notifier-error-inject: Correct test's name
selftests/memory-hotplug: Adjust log info for maintainability
selftests/memory-hotplug: Restore memory before exit
selftests/memory-hotplug: Add checking after online or offline
selftests/ftrace: func_event_triggers: fix typo in user message
|
|
Pull VFIO updates from Alex Williamson:
- Prune private items from vfio_pci_core.h to a new internal header,
fix missed function rename, and refactor vfio-pci interrupt defines
(Jason Gunthorpe)
- Create consistent naming and handling of ioctls with a function per
ioctl for vfio-pci and vfio group handling, use proper type args
where available (Jason Gunthorpe)
- Implement a set of low power device feature ioctls allowing userspace
to make use of power states such as D3cold where supported (Abhishek
Sahu)
- Remove device counter on vfio groups, which had restricted the page
pinning interface to singleton groups to account for limitations in
the type1 IOMMU backend. Document usage as limited to emulated IOMMU
devices, ie. traditional mdev devices where this restriction is
consistent (Jason Gunthorpe)
- Correct function prefix in hisi_acc driver incurred during previous
refactoring (Shameer Kolothum)
- Correct typo and remove redundant warning triggers in vfio-fsl driver
(Christophe JAILLET)
- Introduce device level DMA dirty tracking uAPI and implementation in
the mlx5 variant driver (Yishai Hadas & Joao Martins)
- Move much of the vfio_device life cycle management into vfio core,
simplifying and avoiding duplication across drivers. This also
facilitates adding a struct device to vfio_device which begins the
introduction of device rather than group level user support and fills
a gap allowing userspace identify devices as vfio capable without
implicit knowledge of the driver (Kevin Tian & Yi Liu)
- Split vfio container handling to a separate file, creating a more
well defined API between the core and container code, masking IOMMU
backend implementation from the core, allowing for an easier future
transition to an iommufd based implementation of the same (Jason
Gunthorpe)
- Attempt to resolve race accessing the iommu_group for a device
between vfio releasing DMA ownership and removal of the device from
the IOMMU driver. Follow-up with support to allow vfio_group to exist
with NULL iommu_group pointer to support existing userspace use cases
of holding the group file open (Jason Gunthorpe)
- Fix error code and hi/lo register manipulation issues in the hisi_acc
variant driver, along with various code cleanups (Longfang Liu)
- Fix a prior regression in GVT-g group teardown, resulting in
unreleased resources (Jason Gunthorpe)
- A significant cleanup and simplification of the mdev interface,
consolidating much of the open coded per driver sysfs interface
support into the mdev core (Christoph Hellwig)
- Simplification of tracking and locking around vfio_groups that fall
out from previous refactoring (Jason Gunthorpe)
- Replace trivial open coded f_ops tests with new helper (Alex
Williamson)
* tag 'vfio-v6.1-rc1' of https://github.com/awilliam/linux-vfio: (77 commits)
vfio: More vfio_file_is_group() use cases
vfio: Make the group FD disassociate from the iommu_group
vfio: Hold a reference to the iommu_group in kvm for SPAPR
vfio: Add vfio_file_is_group()
vfio: Change vfio_group->group_rwsem to a mutex
vfio: Remove the vfio_group->users and users_comp
vfio/mdev: add mdev available instance checking to the core
vfio/mdev: consolidate all the description sysfs into the core code
vfio/mdev: consolidate all the available_instance sysfs into the core code
vfio/mdev: consolidate all the name sysfs into the core code
vfio/mdev: consolidate all the device_api sysfs into the core code
vfio/mdev: remove mtype_get_parent_dev
vfio/mdev: remove mdev_parent_dev
vfio/mdev: unexport mdev_bus_type
vfio/mdev: remove mdev_from_dev
vfio/mdev: simplify mdev_type handling
vfio/mdev: embedd struct mdev_parent in the parent data structure
vfio/mdev: make mdev.h standalone includable
drm/i915/gvt: simplify vgpu configuration management
drm/i915/gvt: fix a memory leak in intel_gvt_init_vgpu_types
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- Some minor typo fixes
- A fix of the Xen pcifront driver for supporting the device model to
run in a Linux stub domain
- A cleanup of the pcifront driver
- A series to enable grant-based virtio with Xen on x86
- A cleanup of Xen PV guests to distinguish between safe and faulting
MSR accesses
- Two fixes of the Xen gntdev driver
- Two fixes of the new xen grant DMA driver
* tag 'for-linus-6.1-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Kconfig: Fix spelling mistake "Maxmium" -> "Maximum"
xen/pv: support selecting safe/unsafe msr accesses
xen/pv: refactor msr access functions to support safe and unsafe accesses
xen/pv: fix vendor checks for pmu emulation
xen/pv: add fault recovery control to pmu msr accesses
xen/virtio: enable grant based virtio on x86
xen/virtio: use dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT
xen/virtio: restructure xen grant dma setup
xen/pcifront: move xenstore config scanning into sub-function
xen/gntdev: Accommodate VMA splitting
xen/gntdev: Prevent leaking grants
xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
xen/xenbus: Fix spelling mistake "hardward" -> "hardware"
xen-pcifront: Handle missed Connected state
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"Five hotfixes - three for nilfs2, two for MM. For are cc:stable, one
is not"
* tag 'mm-hotfixes-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
nilfs2: fix leak of nilfs_root in case of writer thread creation failure
nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
nilfs2: fix use-after-free bug of struct nilfs_root
mm/damon/core: initialize damon_target->list in damon_new_target()
mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- hfs and hfsplus kmap API modernization (Fabio Francesco)
- make crash-kexec work properly when invoked from an NMI-time panic
(Valentin Schneider)
- ntfs bugfixes (Hawkins Jiawei)
- improve IPC msg scalability by replacing atomic_t's with percpu
counters (Jiebin Sun)
- nilfs2 cleanups (Minghao Chi)
- lots of other single patches all over the tree!
* tag 'mm-nonmm-stable-2022-10-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits)
include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype
proc: test how it holds up with mapping'less process
mailmap: update Frank Rowand email address
ia64: mca: use strscpy() is more robust and safer
init/Kconfig: fix unmet direct dependencies
ia64: update config files
nilfs2: replace WARN_ONs by nilfs_error for checkpoint acquisition failure
fork: remove duplicate included header files
init/main.c: remove unnecessary (void*) conversions
proc: mark more files as permanent
nilfs2: remove the unneeded result variable
nilfs2: delete unnecessary checks before brelse()
checkpatch: warn for non-standard fixes tag style
usr/gen_init_cpio.c: remove unnecessary -1 values from int file
ipc/msg: mitigate the lock contention with percpu counter
percpu: add percpu_counter_add_local and percpu_counter_sub_local
fs/ocfs2: fix repeated words in comments
relay: use kvcalloc to alloc page array in relay_alloc_page_array
proc: make config PROC_CHILDREN depend on PROC_FS
fs: uninline inode_maybe_inc_iversion()
...
|
|
The follow commands caused a crash:
# cd /sys/kernel/tracing
# echo 's:open char file[]' > dynamic_events
# echo 'hist:keys=common_pid:file=filename:onchange($file).trace(open,$file)' > events/syscalls/sys_enter_openat/trigger'
# echo 1 > events/synthetic/open/enable
BOOM!
The problem is that the synthetic event field "char file[]" will read
the value given to it as a string without any memory checks to make sure
the address is valid. The above example will pass in the user space
address and the sythetic event code will happily call strlen() on it
and then strscpy() where either one will cause an oops when accessing
user space addresses.
Use the helper functions from trace_kprobe and trace_eprobe that can
read strings safely (and actually succeed when the address is from user
space and the memory is mapped in).
Now the above can show:
packagekitd-1721 [000] ...2. 104.597170: open: file=/usr/lib/rpm/fileattrs/cmake.attr
in:imjournal-978 [006] ...2. 104.599642: open: file=/var/lib/rsyslog/imjournal.state.tmp
packagekitd-1721 [000] ...2. 104.626308: open: file=/usr/lib/rpm/fileattrs/debuginfo.attr
Link: https://lkml.kernel.org/r/20221012104534.826549315@goodmis.org
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Have the specific functions for kernel probes that read strings to inject
the "(fault)" name directly. trace_probes.c does this too (for uprobes)
but as the code to read strings are going to be used by synthetic events
(and perhaps other utilities), it simplifies the code by making sure those
other uses do not need to implement the "(fault)" name injection as well.
Link: https://lkml.kernel.org/r/20221012104534.644803645@goodmis.org
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The functions:
fetch_store_strlen_user()
fetch_store_strlen()
fetch_store_string_user()
fetch_store_string()
are identical in both trace_kprobe.c and trace_eprobe.c. Move them into
a new header file trace_probe_kernel.h to share it. This code will later
be used by the synthetic events as well.
Marked for stable as a fix for a crash in synthetic events requires it.
Link: https://lkml.kernel.org/r/20221012104534.467668078@goodmis.org
Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Fixes: bd82631d7ccdc ("tracing: Add support for dynamic strings to synthetic events")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Use EXPLICIT_RELOCS (ABIv2.0)
- Use generic BUG() handler
- Refactor TLB/Cache operations
- Add qspinlock support
- Add perf events support
- Add kexec/kdump support
- Add BPF JIT support
- Add ACPI-based laptop driver
- Update the default config file
* tag 'loongarch-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add ACPI-based generic laptop driver
LoongArch: Add BPF JIT support
LoongArch: Add some instruction opcodes and formats
LoongArch: Move {signed,unsigned}_imm_check() to inst.h
LoongArch: Add kdump support
LoongArch: Add kexec support
LoongArch: Use generic BUG() handler
LoongArch: Add SysRq-x (TLB Dump) support
LoongArch: Add perf events support
LoongArch: Add qspinlock support
LoongArch: Use TLB for ioremap()
LoongArch: Support access filter to /dev/mem interface
LoongArch: Refactor cache probe and flush methods
LoongArch: mm: Refactor TLB exception handlers
LoongArch: Support R_LARCH_GOT_PC_{LO12,HI20} in modules
LoongArch: Support PC-relative relocations in modules
LoongArch: Define ELF relocation types added in ABIv2.0
LoongArch: Adjust symbol addressing for AS_HAS_EXPLICIT_RELOCS
LoongArch: Add Kconfig option AS_HAS_EXPLICIT_RELOCS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt updates from Thomas Gleixner:
"Core code:
- Provide a generic wrapper which can be utilized in drivers to
handle the problem of force threaded demultiplex interrupts on RT
enabled kernels. This avoids conditionals and horrible quirks in
drivers all over the place
- Fix up affected pinctrl and GPIO drivers to make them cleanly RT
safe
Interrupt drivers:
- A new driver for the FSL MU platform specific MSI implementation
- Make irqchip_init() available for pure ACPI based systems
- Provide a functional DT binding for the Realtek RTL interrupt chip
- The usual DT updates and small code improvements all over the
place"
* tag 'irq-core-2022-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
irqchip: IMX_MU_MSI should depend on ARCH_MXC
irqchip/imx-mu-msi: Fix wrong register offset for 8ulp
irqchip/ls-extirq: Fix invalid wait context by avoiding to use regmap
dt-bindings: irqchip: Describe the IMX MU block as a MSI controller
irqchip: Add IMX MU MSI controller driver
dt-bindings: irqchip: renesas,irqc: Add r8a779g0 support
irqchip/gic-v3: Fix typo in comment
dt-bindings: interrupt-controller: ti,sci-intr: Fix missing reg property in the binding
dt-bindings: irqchip: ti,sci-inta: Fix warning for missing #interrupt-cells
irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END
platform-msi: Export symbol platform_msi_create_irq_domain()
irqchip/realtek-rtl: use parent interrupts
dt-bindings: interrupt-controller: realtek,rtl-intc: require parents
irqchip/realtek-rtl: use irq_domain_add_linear()
irqchip: Make irqchip_init() usable on pure ACPI systems
bcma: gpio: Use generic_handle_irq_safe()
gpio: mlxbf2: Use generic_handle_irq_safe()
platform/x86: intel_int0002_vgpio: Use generic_handle_irq_safe()
ssb: gpio: Use generic_handle_irq_safe()
pinctrl: amd: Use generic_handle_irq_safe()
...
|
|
kernel/trace/ring_buffer.c:895: warning: expecting prototype for ring_buffer_nr_pages_dirty(). Prototype was for ring_buffer_nr_dirty_pages() instead.
kernel/trace/ring_buffer.c:5313: warning: expecting prototype for ring_buffer_reset_cpu(). Prototype was for ring_buffer_reset_online_cpus() instead.
kernel/trace/ring_buffer.c:5382: warning: expecting prototype for rind_buffer_empty(). Prototype was for ring_buffer_empty() instead.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2340
Link: https://lkml.kernel.org/r/20221009020642.12506-1-jiapeng.chong@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Currently, we have a bug where a simultaneous DROPTAG ioctl and socket
close may race, as we attempt to remove a key from lists twice, and
perform an unref for each removal operation. This may result in a uaf
when we attempt the second unref.
This change fixes the race by making __mctp_key_remove tolerant to being
called on a key that has already been removed from the socket/net lists,
and only performs the unref when we do the actual remove. We also need
to hold the list lock on the ioctl cleanup path.
This fix is based on a bug report and comprehensive analysis from
butt3rflyh4ck <butterflyhuangxx@gmail.com>, found via syzkaller.
Cc: stable@vger.kernel.org
Fixes: 63ed1aab3d40 ("mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control")
Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Florian Westphal says:
====================
netfilter fixes for net
This series from Phil Sutter for the *net* tree fixes a problem with a change
from the 6.1 development phase: the change to nft_fib should have used
the more recent flowic_l3mdev field. Pointed out by Guillaume Nault.
This also makes the older iptables module follow the same pattern.
Also add selftest case and avoid test failure in nft_fib.sh when the
host environment has set rp_filter=1.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If net.ipv4.conf.all.rp_filter is set, it overrides the per-interface
setting and thus defeats the fix from bbe4c0896d250 ("selftests:
netfilter: disable rp_filter on router"). Unset it as well to cover that
case.
Fixes: bbe4c0896d250 ("selftests: netfilter: disable rp_filter on router")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Use the introduced field for correct operation with VRF devices instead
of conditionally overwriting flowic_oif. This is a partial revert of
commit b575b24b8eee3 ("netfilter: Fix rpfilter dropping vrf packets by
mistake"), implementing a simpler solution.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
Test reverse path (filter) matches in iptables, ip6tables and nftables.
Both with a regular interface and a VRF.
Signed-off-by: Phil Sutter <phil@nwl.cc>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
When ftrace bug happened, following log shows every hex data in
problematic ip address:
actual: ffffffe8:6b:ffffffd9:01:21
But so many 'f's seem a little confusing, and that is because format
'%x' being used to print signed chars in array 'ins'. As suggested
by Joe, change to use format "%*phC" to print array 'ins'.
After this patch, the log is like:
actual: e8:6b:d9:01:21
Link: https://lkml.kernel.org/r/20221011120352.1878494-1-zhengyejian1@huawei.com
Fixes: 6c14133d2d3f ("ftrace: Do not blindly read the ip address in ftrace_bug()")
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
1, Enable ZBOOT, KEXEC and BPF_JIT;
2, Add more patition types;
3, Add some USB Type-C options;
4, Add some common network options;
5, Add some Bluetooth device drivers;
6, Remove obsolete config options (for some detailed information, see
Link).
Link: https://lore.kernel.org/kernel-janitors/20220929090645.1389-1-lukas.bulwahn@gmail.com/
Co-developed-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Co-developed-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
This add ACPI-based generic laptop driver for Loongson-3. Some of the
codes are derived from drivers/platform/x86/thinkpad_acpi.c.
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
BPF programs are normally handled by a BPF interpreter, add BPF JIT
support for LoongArch to allow the kernel to generate native code when
a program is loaded into the kernel. This will significantly speed-up
processing of BPF programs.
Co-developed-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
According to the "Table of Instruction Encoding" in LoongArch Reference
Manual [1], add some instruction opcodes and formats which are used in
the BPF JIT for LoongArch.
[1] https://loongson.github.io/LoongArch-Documentation/LoongArch-Vol1-EN.html#table-of-instruction-encoding
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
{signed,unsigned}_imm_check() will also be used in the bpf jit, so move
them from module.c to inst.h, this is preparation for later patches.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
This patch adds support for kdump. In kdump case the normal kernel will
reserve a region for the crash kernel and jump there on panic.
Arch-specific functions are added to allow for implementing a crash dump
file interface, /proc/vmcore, which can be viewed as a ELF file.
A user-space tool, such as kexec-tools, is responsible for allocating a
separate region for the core's ELF header within the crash kdump kernel
memory and filling it in when executing kexec_load().
Then, its location will be advertised to the crash dump kernel via a
command line argument "elfcorehdr=", and the crash dump kernel will
preserve this region for later use with arch_reserve_vmcore() at boot
time.
At the same time, the crash kdump kernel is also limited within the
"crashkernel" area via a command line argument "mem=", so as not to
destroy the original kernel dump data.
In the crash dump kernel environment, /proc/vmcore is used to access the
primary kernel's memory with copy_oldmem_page().
I tested kdump on LoongArch machines (Loongson-3A5000) and it works as
expected (suggested crashkernel parameter is "crashkernel=512M@2560M"),
you may test it by triggering a crash through /proc/sysrq-trigger:
$ sudo kexec -p /boot/vmlinux-kdump --reuse-cmdline --append="nr_cpus=1"
# echo c > /proc/sysrq-trigger
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add three new files, kexec.h, machine_kexec.c and relocate_kernel.S to
the LoongArch architecture, so as to add support for the kexec re-boot
mechanism (CONFIG_KEXEC) on LoongArch platforms.
Kexec supports loading vmlinux.elf in ELF format and vmlinux.efi in PE
format.
I tested kexec on LoongArch machines (Loongson-3A5000) and it works as
expected:
$ sudo kexec -l /boot/vmlinux.efi --reuse-cmdline
$ sudo kexec -e
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Inspired by commit 9fb7410f955("arm64/BUG: Use BRK instruction for
generic BUG traps"), do similar for LoongArch to use generic BUG()
handler.
This patch uses the BREAK software breakpoint instruction to generate
a trap instead, similarly to most other arches, with the generic BUG
code generating the dmesg boilerplate.
This allows bug metadata to be moved to a separate table and reduces
the amount of inline code at BUG() and WARN() sites. This also avoids
clobbering any registers before they can be dumped.
To mitigate the size of the bug table further, this patch makes use of
the existing infrastructure for encoding addresses within the bug table
as 32-bit relative pointers instead of absolute pointers.
(Note: this limits the max kernel size to 2GB.)
Before patch:
[ 3018.338013] lkdtm: Performing direct entry BUG
[ 3018.342445] Kernel bug detected[#5]:
[ 3018.345992] CPU: 2 PID: 865 Comm: cat Tainted: G D 6.0.0-rc6+ #35
After patch:
[ 125.585985] lkdtm: Performing direct entry BUG
[ 125.590433] ------------[ cut here ]------------
[ 125.595020] kernel BUG at drivers/misc/lkdtm/bugs.c:78!
[ 125.600211] Oops - BUG[#1]:
[ 125.602980] CPU: 3 PID: 410 Comm: cat Not tainted 6.0.0-rc6+ #36
Out-of-line file/line data information obtained compared to before.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Add SysRq-x (TLB Dump) support for LoongArch, which is useful for
debugging.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
The perf events infrastructure of LoongArch is very similar to old MIPS-
based Loongson, so most of the codes are derived from MIPS.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
On NUMA system, the performance of qspinlock is better than generic
spinlock. Below is the UnixBench test results on a 8 nodes (4 cores
per node, 32 cores in total) machine.
A. With generic spinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 449574022.5 38523.9
Double-Precision Whetstone 55.0 85190.4 15489.2
Execl Throughput 43.0 14696.2 3417.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 143157.8 361.5
File Copy 256 bufsize 500 maxblocks 1655.0 37631.8 227.4
File Copy 4096 bufsize 8000 maxblocks 5800.0 444814.2 766.9
Pipe Throughput 12440.0 5047490.7 4057.5
Pipe-based Context Switching 4000.0 2021545.7 5053.9
Process Creation 126.0 23829.8 1891.3
Shell Scripts (1 concurrent) 42.4 33756.7 7961.5
Shell Scripts (8 concurrent) 6.0 4062.9 6771.5
System Call Overhead 15000.0 2479748.6 1653.2
========
System Benchmarks Index Score 2955.6
B. With qspinlock:
System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 449467876.9 38514.8
Double-Precision Whetstone 55.0 85174.6 15486.3
Execl Throughput 43.0 14769.1 3434.7
File Copy 1024 bufsize 2000 maxblocks 3960.0 146150.5 369.1
File Copy 256 bufsize 500 maxblocks 1655.0 37496.8 226.6
File Copy 4096 bufsize 8000 maxblocks 5800.0 447527.0 771.6
Pipe Throughput 12440.0 5175989.2 4160.8
Pipe-based Context Switching 4000.0 2207747.8 5519.4
Process Creation 126.0 25125.5 1994.1
Shell Scripts (1 concurrent) 42.4 33461.2 7891.8
Shell Scripts (8 concurrent) 6.0 4024.7 6707.8
System Call Overhead 15000.0 2917278.6 1944.9
========
System Benchmarks Index Score 3040.1
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
We can support more cache attributes (e.g., CC, SUC and WUC) and page
protection when we use TLB for ioremap(). The implementation is based
on GENERIC_IOREMAP.
The existing simple ioremap() implementation has better performance so
we keep it and introduce ARCH_IOREMAP to control the selection.
We move pagetable_init() earlier to make early ioremap() works, and we
modify the PCI ecam mapping because the TLB-based version of ioremap()
will actually take the size into account.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Accidental access to /dev/mem is obviously disastrous, but specific
access can be used by people debugging the kernel. So select GENERIC_
LIB_DEVMEM_IS_ALLOWED, as well as define ARCH_HAS_VALID_PHYS_ADDR_RANGE
and related helpers, to support access filter to /dev/mem interface.
Signed-off-by: Weihao Li <liweihao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Current cache probe and flush methods have some drawbacks:
1, Assume there are 3 cache levels and only 3 levels;
2, Assume L1 = I + D, L2 = V, L3 = S, V is exclusive, S is inclusive.
However, the fact is I + D, I + D + V, I + D + S and I + D + V + S are
all valid. So, refactor the cache probe and flush methods to adapt more
types of cache hierarchy.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
This patch simplifies TLB load, store and modify exception handlers:
1. Reduce instructions, such as alu/csr and memory access;
2. Execute tlb search instruction only in the fast path;
3. Return directly from the fast path for both normal and huge pages;
4. Re-tab the assembly for better vertical alignment.
And fixes the concurrent modification issue of fast path for huge pages.
This issue will occur in the following steps:
CPU-1 (In TLB exception) CPU-2 (In THP splitting)
1: Load PMD entry (HUGE=1)
2: Goto huge path
3: Store PMD entry (HUGE=0)
4: Reload PMD entry (HUGE=0)
5: Fill TLB entry (PA is incorrect)
This patch also slightly improves the TLB processing performance:
* Normal pages: 2.15%, Huge pages: 1.70%.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
size_t page_size;
size_t mem_size;
size_t off;
void *base;
int flags;
int i;
if (argc < 2) {
fprintf(stderr, "%s MEM_SIZE [HUGE]\n", argv[0]);
return -1;
}
page_size = sysconf(_SC_PAGESIZE);
flags = MAP_PRIVATE | MAP_ANONYMOUS;
mem_size = strtoul(argv[1], NULL, 10);
if (argc > 2)
flags |= MAP_HUGETLB;
for (i = 0; i < 10; i++) {
base = mmap(NULL, mem_size, PROT_READ, flags, -1, 0);
if (base == MAP_FAILED) {
fprintf(stderr, "Map memory failed!\n");
return -1;
}
for (off = 0; off < mem_size; off += page_size)
*(volatile int *)(base + off);
munmap(base, mem_size);
}
return 0;
}
Signed-off-by: Rui Wang <wangrui@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
GCC >= 13 and GNU assembler >= 2.40 use these relocations to address
external symbols, so we need to add them.
Let the module loader emit GOT entries for data symbols so we would be
able to handle GOT relocations. The GOT entry is just the data's symbol
address.
In module.lds, emit a stub .got section for a section header entry. The
actual content of the section entry will be filled at runtime by module_
frob_arch_sections().
Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Binutils >= 2.40 uses R_LARCH_B26 instead of R_LARCH_SOP_PUSH_PLT_PCREL,
and R_LARCH_PCALA* instead of R_LARCH_SOP_PUSH_PCREL.
Handle R_LARCH_B26 and R_LARCH_PCALA* in the module loader. For R_LARCH_
B26, also create a PLT entry as needed.
Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
These relocation types are used by GNU binutils >= 2.40 and GCC >= 13.
Add their definitions so we will be able to use them in later patches.
Link: https://github.com/loongson/LoongArch-Documentation/pull/57
Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
If explicit relocation hints are used by the toolchain, -Wa,-mla-*
options will be useless for the C code. So only use them for the
!CONFIG_AS_HAS_EXPLICIT_RELOCS case.
Replace "la" with "la.pcrel" in head.S to keep the semantic consistent
with new and old toolchains for the low level startup code.
For per-CPU variables, the "address" of the symbol is actually an offset
from $r21. The value is near the loading address of main kernel image,
but far from the loading address of modules. So we use model("extreme")
attibute to tell the compiler that a PC-relative addressing with 32-bit
offset is not sufficient for local per-CPU variables.
The behavior with different assemblers and compilers are summarized in
the following table:
AS has CC has
explicit relocs explicit relocs * Behavior
==============================================================
No No Use la.* macros.
No change from Linux 6.0.
--------------------------------------------------------------
No Yes Disable explicit relocs.
No change from Linux 6.0.
--------------------------------------------------------------
Yes No Not supported.
--------------------------------------------------------------
Yes Yes Enable explicit relocs.
No -Wa,-mla* options used.
==============================================================
*: We assume CC must have model attribute if it has explicit relocs.
Both features are added in GCC 13 development cycle, so any GCC
release >= 13 should be OK. Using early GCC 13 development snapshots
may produce modules with unsupported relocations.
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f09482a
Link: https://gcc.gnu.org/r13-1834
Link: https://gcc.gnu.org/r13-2199
Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
GNU as >= 2.40 and GCC >= 13 will support using explicit relocation
hints in the assembly code, instead of la.* macros. The usage of
explicit relocation hints can improve code generation so it's enabled
by default by GCC >= 13.
Introduce a Kconfig option AS_HAS_EXPLICIT_RELOCS as the switch for
"use explicit relocation hints or not".
Tested-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
There is a spelling mistake in a commented section. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In case of __xchg()/__cmpxchg() this would cause to reference
BUILD_BUG(), which is an error case for catching bugs and will not
happen for correct code, if __xchg()/__cmpxchg() is inlined.
This bug can be produced with CONFIG_DEBUG_SECTION_MISMATCH enabled,
and the solution is similar to below commits:
46f1619500d0225 ("MIPS: include: Mark __xchg as __always_inline"),
88356d09904bc60 ("MIPS: include: Mark __cmpxchg as __always_inline").
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Move local_flush_tlb_all() earlier (just after setup_ptwalker() and
before page allocation). This can avoid stale TLB entries misguiding
the later page allocation. Without this patch the second kernel of
kexec/kdump fails to boot SMP.
BTW, move output_pgtable_bits_defines() into tlb_init() since it has
nothing to do with tlb handler setup.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Now io master CPUs are not hotpluggable on LoongArch, but in the current
code only /sys/devices/system/cpu/cpu0/online is not created. Let us set
the hotpluggable field of all the io master CPUs as 0, then prevent to
create sysfs control file for all the io master CPUs which confuses some
user space tools. This is similar with commit 9cce844abf07 ("MIPS: CPU#0
is not hotpluggable").
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Don't overwrite the SMBIOS-provided CPU name on coming back from CPU-
hotplug (including S3/S4) if it is already initialized.
Reviewed-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Poll CQ functions shouldn't sleep as they are called in atomic context.
The following splat appears once the mlx5_aso_poll_cq() is used in such
flow.
BUG: scheduling while atomic: swapper/17/0/0x00000100
Modules linked in: sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core fuse [last unloaded: mlx5_core]
CPU: 17 PID: 0 Comm: swapper/17 Tainted: G W 6.0.0-rc2+ #13
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
__schedule_bug.cold+0x47/0x53
__schedule+0x4b6/0x670
? hrtimer_start_range_ns+0x28d/0x360
schedule+0x50/0x90
schedule_hrtimeout_range_clock+0x98/0x120
? __hrtimer_init+0xb0/0xb0
usleep_range_state+0x60/0x90
mlx5_aso_poll_cq+0xad/0x190 [mlx5_core]
mlx5e_ipsec_aso_update_curlft+0x81/0xb0 [mlx5_core]
xfrm_timer_handler+0x6b/0x360
? xfrm_find_acq_byseq+0x50/0x50
__hrtimer_run_queues+0x139/0x290
hrtimer_run_softirq+0x7d/0xe0
__do_softirq+0xc7/0x272
irq_exit_rcu+0x87/0xb0
sysvec_apic_timer_interrupt+0x72/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x16/0x20
RIP: 0010:default_idle+0x18/0x20
Code: ae 7d ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 8b 05 b5 30 0d 01 85 c0 7e 07 0f 00 2d 0a e3 53 00 fb f4 <c3> 0f 1f 80 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 80 ad 01 00
RSP: 0018:ffff888100883ee0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffff888100849580 RCX: 4000000000000000
RDX: 0000000000000001 RSI: 0000000000000083 RDI: 000000000008863c
RBP: 0000000000000011 R08: 00000064e6977fa9 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
default_idle_call+0x37/0xb0
do_idle+0x1cd/0x1e0
cpu_startup_entry+0x19/0x20
start_secondary+0xfe/0x120
secondary_startup_64_no_verify+0xcd/0xdb
</TASK>
softirq: huh, entered softirq 8 HRTIMER 00000000a97c08cb with preempt_count 00000100, exited with 00000000?
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip fixes from Marc Zyngier:
- Fix IMX-MU Kconfig, keeping it private to IMX
- Fix a register offset for the same IMX-MU driver
- Fix the ls-extirq irqchip driver that would use the wrong
flavour of spinlocks
Link: https://lore.kernel.org/r/20221012075125.1244143-1-maz@kernel.org
|
|
Apparently, mptcp is able to call tcp_disconnect() on an already
disconnected flow. This is generally fine, unless current congestion
control is CDG, because it might trigger a double-free [1]
Instead of fixing MPTCP, and future bugs, we can make tcp_disconnect()
more resilient.
[1]
BUG: KASAN: double-free in slab_free mm/slub.c:3539 [inline]
BUG: KASAN: double-free in kfree+0xe2/0x580 mm/slub.c:4567
CPU: 0 PID: 3645 Comm: kworker/0:7 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
Workqueue: events mptcp_worker
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report_invalid_free+0x81/0x190 mm/kasan/report.c:462
____kasan_slab_free+0x18b/0x1c0 mm/kasan/common.c:356
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_disconnect+0x980/0x1e20 net/ipv4/tcp.c:3145
__mptcp_close_ssk+0x5ca/0x7e0 net/mptcp/protocol.c:2327
mptcp_do_fastclose net/mptcp/protocol.c:2592 [inline]
mptcp_worker+0x78c/0xff0 net/mptcp/protocol.c:2627
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
Allocated by task 3671:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_array include/linux/slab.h:640 [inline]
kcalloc include/linux/slab.h:671 [inline]
tcp_cdg_init+0x10d/0x170 net/ipv4/tcp_cdg.c:380
tcp_init_congestion_control+0xab/0x550 net/ipv4/tcp_cong.c:193
tcp_reinit_congestion_control net/ipv4/tcp_cong.c:217 [inline]
tcp_set_congestion_control+0x96c/0xaa0 net/ipv4/tcp_cong.c:391
do_tcp_setsockopt+0x505/0x2320 net/ipv4/tcp.c:3513
tcp_setsockopt+0xd4/0x100 net/ipv4/tcp.c:3801
mptcp_setsockopt+0x35f/0x2570 net/mptcp/sockopt.c:844
__sys_setsockopt+0x2d6/0x690 net/socket.c:2252
__do_sys_setsockopt net/socket.c:2263 [inline]
__se_sys_setsockopt net/socket.c:2260 [inline]
__x64_sys_setsockopt+0xba/0x150 net/socket.c:2260
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 16:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1759 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1785
slab_free mm/slub.c:3539 [inline]
kfree+0xe2/0x580 mm/slub.c:4567
tcp_cleanup_congestion_control+0x70/0x120 net/ipv4/tcp_cong.c:226
tcp_v4_destroy_sock+0xdd/0x750 net/ipv4/tcp_ipv4.c:2254
tcp_v6_destroy_sock+0x11/0x20 net/ipv6/tcp_ipv6.c:1969
inet_csk_destroy_sock+0x196/0x440 net/ipv4/inet_connection_sock.c:1157
tcp_done+0x23b/0x340 net/ipv4/tcp.c:4649
tcp_rcv_state_process+0x40e7/0x4990 net/ipv4/tcp_input.c:6624
tcp_v6_do_rcv+0x3fc/0x13c0 net/ipv6/tcp_ipv6.c:1525
tcp_v6_rcv+0x2e8e/0x3830 net/ipv6/tcp_ipv6.c:1759
ip6_protocol_deliver_rcu+0x2db/0x1950 net/ipv6/ip6_input.c:439
ip6_input_finish+0x14c/0x2c0 net/ipv6/ip6_input.c:484
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:493
dst_input include/net/dst.h:455 [inline]
ip6_rcv_finish+0x193/0x2c0 net/ipv6/ip6_input.c:79
ip_sabotage_in net/bridge/br_netfilter_hooks.c:874 [inline]
ip_sabotage_in+0x1fa/0x260 net/bridge/br_netfilter_hooks.c:865
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_slow+0xc5/0x1f0 net/netfilter/core.c:614
nf_hook.constprop.0+0x3ac/0x650 include/linux/netfilter.h:257
NF_HOOK include/linux/netfilter.h:300 [inline]
ipv6_rcv+0x9e/0x380 net/ipv6/ip6_input.c:309
__netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5485
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
netif_receive_skb_internal net/core/dev.c:5685 [inline]
netif_receive_skb+0x12f/0x8d0 net/core/dev.c:5744
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
br_pass_frame_up+0x303/0x410 net/bridge/br_input.c:68
br_handle_frame_finish+0x909/0x1aa0 net/bridge/br_input.c:199
br_nf_hook_thresh+0x2f8/0x3d0 net/bridge/br_netfilter_hooks.c:1041
br_nf_pre_routing_finish_ipv6+0x695/0xef0 net/bridge/br_netfilter_ipv6.c:207
NF_HOOK include/linux/netfilter.h:302 [inline]
br_nf_pre_routing_ipv6+0x417/0x7c0 net/bridge/br_netfilter_ipv6.c:237
br_nf_pre_routing+0x1496/0x1fe0 net/bridge/br_netfilter_hooks.c:507
nf_hook_entry_hookfn include/linux/netfilter.h:142 [inline]
nf_hook_bridge_pre net/bridge/br_input.c:255 [inline]
br_handle_frame+0x9c9/0x12d0 net/bridge/br_input.c:399
__netif_receive_skb_core+0x9fe/0x38f0 net/core/dev.c:5379
__netif_receive_skb_one_core+0xae/0x180 net/core/dev.c:5483
__netif_receive_skb+0x1f/0x1c0 net/core/dev.c:5599
process_backlog+0x3a0/0x7c0 net/core/dev.c:5927
__napi_poll+0xb3/0x6d0 net/core/dev.c:6494
napi_poll net/core/dev.c:6561 [inline]
net_rx_action+0x9c1/0xd90 net/core/dev.c:6672
__do_softirq+0x1d0/0x9c8 kernel/softirq.c:571
Fixes: 2b0a8c9eee81 ("tcp: add CDG congestion control")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric Dumazet says:
====================
inet: ping: give ping some care
First patch fixes an ipv6 ping bug that has been there forever,
for large sizes.
Second patch fixes a recent and elusive bug, that can potentially
crash the host. This is what I mentioned privately to Paolo and
Jakub at LPC in Dublin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Blamed commit broke the assumption used by ping sendmsg() that
allocated skb would have MAX_HEADER bytes in skb->head.
This patch changes the way ping works, by making sure
the skb head contains space for the icmp header,
and adjusting ping_getfrag() which was desperate
about going past the icmp header :/
This is adopting what UDP does, mostly.
syzbot is able to crash a host using both kfence and following repro in a loop.
fd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6)
connect(fd, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0),
inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28
sendmsg(fd, {msg_name=NULL, msg_namelen=0, msg_iov=[
{iov_base="\200\0\0\0\23\0\0\0\0\0\0\0\0\0"..., iov_len=65496}],
msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0
When kfence triggers, skb->head only has 64 bytes, immediately followed
by struct skb_shared_info (no extra headroom based on ksize(ptr))
Then icmpv6_push_pending_frames() is overwriting first bytes
of skb_shinfo(skb), making nr_frags bigger than MAX_SKB_FRAGS,
and/or setting shinfo->gso_size to a non zero value.
If nr_frags is mangled, a crash happens in skb_release_data()
If gso_size is mangled, we have the following report:
lo: caps=(0x00000516401d7c69, 0x00000516401d7c69)
WARNING: CPU: 0 PID: 7548 at net/core/dev.c:3239 skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
Modules linked in:
CPU: 0 PID: 7548 Comm: syz-executor268 Not tainted 6.0.0-syzkaller-02754-g557f050166e5 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
RIP: 0010:skb_warn_bad_offload+0x119/0x230 net/core/dev.c:3239
Code: 70 03 00 00 e8 58 c3 24 fa 4c 8d a5 e8 00 00 00 e8 4c c3 24 fa 4c 89 e9 4c 89 e2 4c 89 f6 48 c7 c7 00 53 f5 8a e8 13 ac e7 01 <0f> 0b 5b 5d 41 5c 41 5d 41 5e e9 28 c3 24 fa e8 23 c3 24 fa 48 89
RSP: 0018:ffffc9000366f3e8 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88807a9d9d00 RCX: 0000000000000000
RDX: ffff8880780c0000 RSI: ffffffff8160f6f8 RDI: fffff520006cde6f
RBP: ffff888079952000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000400 R11: 0000000000000000 R12: ffff8880799520e8
R13: ffff88807a9da070 R14: ffff888079952000 R15: 0000000000000000
FS: 0000555556be6300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020010000 CR3: 000000006eb7b000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
gso_features_check net/core/dev.c:3521 [inline]
netif_skb_features+0x83e/0xb90 net/core/dev.c:3554
validate_xmit_skb+0x2b/0xf10 net/core/dev.c:3659
__dev_queue_xmit+0x998/0x3ad0 net/core/dev.c:4248
dev_queue_xmit include/linux/netdevice.h:3008 [inline]
neigh_hh_output include/net/neighbour.h:530 [inline]
neigh_output include/net/neighbour.h:544 [inline]
ip6_finish_output2+0xf97/0x1520 net/ipv6/ip6_output.c:134
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:445 [inline]
ip6_local_out+0xaf/0x1a0 net/ipv6/output_core.c:161
ip6_send_skb+0xb7/0x340 net/ipv6/ip6_output.c:1966
ip6_push_pending_frames+0xdd/0x100 net/ipv6/ip6_output.c:1986
icmpv6_push_pending_frames+0x2af/0x490 net/ipv6/icmp.c:303
ping_v6_sendmsg+0xc44/0x1190 net/ipv6/ping.c:190
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f21aab42b89
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff1729d038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f21aab42b89
RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 000000000000000d R11: 0000000000000246 R12: 00007fff1729d050
R13: 00000000000f4240 R14: 0000000000021dd1 R15: 00007fff1729d044
</TASK>
Fixes: 47cf88993c91 ("net: unify alloclen calculation for paged requests")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
For a given ping datagram, ping_getfrag() is called once
per skb fragment.
A large datagram requiring more than one page fragment
is currently getting the checksum of the last fragment,
instead of the cumulative one.
After this patch, "ping -s 35000 ::1" is working correctly.
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|