Age | Commit message (Collapse) | Author |
|
The Spectre-BHB mitigations were inadvertently left disabled for
Cortex-A15, due to the fact that cpu_v7_bugs_init() is not called in
that case. So fix that.
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2022-05-17
This series provides bug fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since the recent introduction supporting the SM3 and SM4 hash algos for IPsec, the kernel
produces invalid pfkey acquire messages, when these encryption modules are disabled. This
happens because the availability of the algos wasn't checked in all necessary functions.
This patch adds these checks.
Signed-off-by: Thomas Bartschies <thomas.bartschies@cvk.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
If skb_clone() returns null pointer, pfkey_broadcast() will
return error.
Therefore, it should be better to check the return value of
pfkey_broadcast() and return error if fails.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
That way percpu_ref_exit() is safe after failing percpu_ref_init().
At least one user (cgroup_create()) had a double-free that way;
there might be other similar bugs. Easier to fix in percpu_ref_init(),
rather than playing whack-a-mole in sloppy users...
Usual symptoms look like a messed refcounting in one of subsystems
that use percpu allocations (might be percpu-refcount, might be
something else). Having refcounts for two different objects share
memory is Not Nice(tm)...
Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
In case fw sync reset is called in parallel to device removal, device
might stuck in the following deadlock:
CPU 0 CPU 1
----- -----
remove_one
uninit_one (locks intf_state_mutex)
mlx5_sync_reset_now_event()
work in fw_reset->wq.
mlx5_enter_error_state()
mutex_lock (intf_state_mutex)
cleanup_once
fw_reset_cleanup()
destroy_workqueue(fw_reset->wq)
Drain the fw_reset WQ, and make sure no new work is being queued, before
entering uninit_one().
The Drain is done before devlink_unregister() since fw_reset, in some
flows, is using devlink API devlink_remote_reload_actions_performed().
Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Cited patch sets flow_source to ANY overriding the provided spec
flow_source, avoiding the optimization done by commit c9c079b4deaa
("net/mlx5: CT: Set flow source hint from provided tuple device").
To fix the above, set the dr_rule flow_source from provided flow spec.
Fixes: 3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
cited commit removed support for GRE tuples when software steering was enabled.
To bring back support for GRE tuples, add GRE ipv4/ipv6 matchers.
Fixes: 3ee61ebb0df1 ("net/mlx5: CT: Add software steering ct flow steering provider")
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
We got reports of certain HW-GRO flows causing kernel call traces, which
might be related to firmware. To be on the safe side, disable the
feature for now and re-enable it once a driver/firmware fix is found.
Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
HW GRO is incompatible and mutually exclusive with XDP and XSK. However,
the needed checks are only made when enabling XDP. If HW GRO is enabled
when XDP is already active, the command will succeed, and XDP will be
skipped in the data path, although still enabled.
This commit fixes the bug by checking the XDP and XSK status in
mlx5e_fix_features and disabling HW GRO if XDP is enabled.
Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
LRO is incompatible and mutually exclusive with XDP. However, the needed
checks are only made when enabling XDP. If LRO is enabled when XDP is
already active, the command will succeed, and XDP will be skipped in the
data path, although still enabled.
This commit fixes the bug by checking the XDP status in
mlx5e_fix_features and disabling LRO if XDP is enabled.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When the driver is in switchdev mode and rx-gro-hw is set, the RQ needs
special CQE handling. Till then, block setting of rx-gro-hw feature in
switchdev mode, to avoid failure while setting the feature due to
failure while opening the RQ.
Fixes: f97d5c2a453e ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
The body of mlx5e_napi_poll is wrapped into rcu_read_lock to be able to
read the XDP program pointer using rcu_dereference. However, the trap RQ
NAPI doesn't use rcu_read_lock, because the trap RQ works only in the
non-linear mode, and mlx5e_skb_from_cqe_nonlinear, until recently,
didn't support XDP and didn't call rcu_dereference.
Starting from the cited commit, mlx5e_skb_from_cqe_nonlinear supports
XDP and calls rcu_dereference, but mlx5e_trap_napi_poll doesn't wrap it
into rcu_read_lock. It leads to RCU-lockdep warnings like this:
WARNING: suspicious RCU usage
This commit fixes the issue by adding an rcu_read_lock to
mlx5e_trap_napi_poll, similarly to mlx5e_napi_poll.
Fixes: ea5d49bdae8b ("net/mlx5e: Add XDP multi buffer support to the non-linear legacy RQ")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When modifying TTL, packet's csum has to be recalculated.
Due to HW issue in ConnectX-5, csum recalculation for modify
TTL on RX is supported through a work-around that is specifically
enabled by configuration.
If the work-around isn't enabled, rather than adding an unsupported
action the modify TTL action on RX should be ignored.
Ignoring modify TTL action might result in zero actions, so in such
cases we will not convert the match STE to modify STE, as it is done
by FW in DMFS.
This patch fixes an issue where modify TTL action was ignored both
on RX and TX instead of only on RX.
Fixes: 4ff725e1d4ad ("net/mlx5: DR, Ignore modify TTL if device doesn't support it")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Currently, software objects of flow steering are created and destroyed
during reload flow. In case a device is unloaded, the following error
is printed during grace period:
mlx5_core 0000:00:0b.0: mlx5_fw_fatal_reporter_err_work:690:(pid 95):
Driver is in error state. Unloading
As a solution to fix use-after-free bugs, where we try to access
these objects, when reading the value of flow_steering_mode devlink
param[1], let's split flow steering creation and destruction into two
routines:
* init and cleanup: memory, cache, and pools allocation/free.
* create and destroy: namespaces initialization and cleanup.
While at it, re-order the cleanup function to mirror the init function.
[1]
Kasan trace:
[ 385.119849 ] BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] Read of size 4 at addr ffff888104b79308 by task bash/291
[ 385.119849 ]
[ 385.119849 ] CPU: 1 PID: 291 Comm: bash Not tainted 5.17.0-rc1+ #2
[ 385.119849 ] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
[ 385.119849 ] Call Trace:
[ 385.119849 ] <TASK>
[ 385.119849 ] dump_stack_lvl+0x6e/0x91
[ 385.119849 ] print_address_description.constprop.0+0x1f/0x160
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] kasan_report.cold+0x83/0xdf
[ 385.119849 ] ? devlink_param_notify+0x20/0x190
[ 385.119849 ] ? mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] mlx5_devlink_fs_mode_get+0x3b/0xa0
[ 385.119849 ] devlink_nl_param_fill+0x18a/0xa50
[ 385.119849 ] ? _raw_spin_lock_irqsave+0x8d/0xe0
[ 385.119849 ] ? devlink_flash_update_timeout_notify+0xf0/0xf0
[ 385.119849 ] ? __wake_up_common+0x4b/0x1e0
[ 385.119849 ] ? preempt_count_sub+0x14/0xc0
[ 385.119849 ] ? _raw_spin_unlock_irqrestore+0x28/0x40
[ 385.119849 ] ? __wake_up_common_lock+0xe3/0x140
[ 385.119849 ] ? __wake_up_common+0x1e0/0x1e0
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp8+0x27/0x80
[ 385.119849 ] ? __rcu_read_unlock+0x48/0x70
[ 385.119849 ] ? kasan_unpoison+0x23/0x50
[ 385.119849 ] ? __kasan_slab_alloc+0x2c/0x80
[ 385.119849 ] ? memset+0x20/0x40
[ 385.119849 ] ? __sanitizer_cov_trace_const_cmp4+0x25/0x80
[ 385.119849 ] devlink_param_notify+0xce/0x190
[ 385.119849 ] devlink_unregister+0x92/0x2b0
[ 385.119849 ] remove_one+0x41/0x140
[ 385.119849 ] pci_device_remove+0x68/0x140
[ 385.119849 ] ? pcibios_free_irq+0x10/0x10
[ 385.119849 ] __device_release_driver+0x294/0x3f0
[ 385.119849 ] device_driver_detach+0x82/0x130
[ 385.119849 ] unbind_store+0x193/0x1b0
[ 385.119849 ] ? subsys_interface_unregister+0x270/0x270
[ 385.119849 ] drv_attr_store+0x4e/0x70
[ 385.119849 ] ? drv_attr_show+0x60/0x60
[ 385.119849 ] sysfs_kf_write+0xa7/0xc0
[ 385.119849 ] kernfs_fop_write_iter+0x23a/0x2f0
[ 385.119849 ] ? sysfs_kf_bin_read+0x160/0x160
[ 385.119849 ] new_sync_write+0x311/0x430
[ 385.119849 ] ? new_sync_read+0x480/0x480
[ 385.119849 ] ? _raw_spin_lock+0x87/0xe0
[ 385.119849 ] ? __sanitizer_cov_trace_cmp4+0x25/0x80
[ 385.119849 ] ? security_file_permission+0x94/0xa0
[ 385.119849 ] vfs_write+0x4c7/0x590
[ 385.119849 ] ksys_write+0xf6/0x1e0
[ 385.119849 ] ? __x64_sys_read+0x50/0x50
[ 385.119849 ] ? fpregs_assert_state_consistent+0x99/0xa0
[ 385.119849 ] do_syscall_64+0x3d/0x90
[ 385.119849 ] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 385.119849 ] RIP: 0033:0x7fc36ef38504
[ 385.119849 ] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f
80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53
[ 385.119849 ] RSP: 002b:00007ffde0ff3d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 385.119849 ] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc36ef38504
[ 385.119849 ] RDX: 000000000000000c RSI: 00007fc370521040 RDI: 0000000000000001
[ 385.119849 ] RBP: 00007fc370521040 R08: 00007fc36f00b8c0 R09: 00007fc36ee4b740
[ 385.119849 ] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fc36f00a760
[ 385.119849 ] R13: 000000000000000c R14: 00007fc36f005760 R15: 000000000000000c
[ 385.119849 ] </TASK>
[ 385.119849 ]
[ 385.119849 ] Allocated by task 65:
[ 385.119849 ] kasan_save_stack+0x1e/0x40
[ 385.119849 ] __kasan_kmalloc+0x81/0xa0
[ 385.119849 ] mlx5_init_fs+0x11b/0x1160
[ 385.119849 ] mlx5_load+0x13c/0x220
[ 385.119849 ] mlx5_load_one+0xda/0x160
[ 385.119849 ] mlx5_recover_device+0xb8/0x100
[ 385.119849 ] mlx5_health_try_recover+0x2f9/0x3a1
[ 385.119849 ] devlink_health_reporter_recover+0x75/0x100
[ 385.119849 ] devlink_health_report+0x26c/0x4b0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work+0x11e/0x1b0
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] Freed by task 65:
[ 385.275909 ] kasan_save_stack+0x1e/0x40
[ 385.275909 ] kasan_set_track+0x21/0x30
[ 385.275909 ] kasan_set_free_info+0x20/0x30
[ 385.275909 ] __kasan_slab_free+0xfc/0x140
[ 385.275909 ] kfree+0xa5/0x3b0
[ 385.275909 ] mlx5_unload+0x2e/0xb0
[ 385.275909 ] mlx5_unload_one+0x86/0xb0
[ 385.275909 ] mlx5_fw_fatal_reporter_err_work.cold+0xca/0xcf
[ 385.275909 ] process_one_work+0x520/0x970
[ 385.275909 ] worker_thread+0x378/0x950
[ 385.275909 ] kthread+0x1bb/0x200
[ 385.275909 ] ret_from_fork+0x1f/0x30
[ 385.275909 ]
[ 385.275909 ] The buggy address belongs to the object at ffff888104b79300
[ 385.275909 ] which belongs to the cache kmalloc-128 of size 128
[ 385.275909 ] The buggy address is located 8 bytes inside of
[ 385.275909 ] 128-byte region [ffff888104b79300, ffff888104b79380)
[ 385.275909 ] The buggy address belongs to the page:
[ 385.275909 ] page:00000000de44dd39 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x104b78
[ 385.275909 ] head:00000000de44dd39 order:1 compound_mapcount:0
[ 385.275909 ] flags: 0x8000000000010200(slab|head|zone=2)
[ 385.275909 ] raw: 8000000000010200 0000000000000000 dead000000000122 ffff8881000428c0
[ 385.275909 ] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[ 385.275909 ] page dumped because: kasan: bad access detected
[ 385.275909 ]
[ 385.275909 ] Memory state around the buggy address:
[ 385.275909 ] ffff888104b79200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc
[ 385.275909 ] ffff888104b79280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ] >ffff888104b79300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ] ^
[ 385.275909 ] ffff888104b79380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 385.275909 ] ffff888104b79400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 385.275909 ]]
Fixes: e890acd5ff18 ("net/mlx5: Add devlink flow_steering_mode parameter")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In order to support multiple destination FTEs with SW steering
FW table is created with single FTE with multiple actions and
SW steering rule forward to it. When creating this table, flow
source isn't set according to the original FTE.
Fix this by passing the original FTE flow source to the created
FW table.
Fixes: 34583beea4b7 ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: Maor Dickman <maord@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
In commit d72d827f2f26, I used 'cpumask_t' incorrectly:
void iscsit_thread_get_cpumask(struct iscsi_conn *conn)
{
int ord, cpu;
cpumask_t conn_allowed_cpumask;
......
}
static ssize_t lio_target_wwn_cpus_allowed_list_store(
struct config_item *item, const char *page, size_t count)
{
int ret;
char *orig;
cpumask_t new_allowed_cpumask;
......
}
The correct pattern should be as follows:
cpumask_var_t mask;
if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
return -ENOMEM;
... use 'mask' here ...
free_cpumask_var(mask);
Link: https://lore.kernel.org/r/20220516054721.1548-1-mingzhe.zou@easystack.cn
Fixes: d72d827f2f26 ("scsi: target: Add iscsi/cpus_allowed_list in configfs")
Reported-by: Test Bot <zgrieee@gmail.com>
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There are sleep in atomic context bugs when the request to secure
element of st-nci is timeout. The root cause is that nci_skb_alloc
with GFP_KERNEL parameter is called in st_nci_se_wt_timeout which is
a timer handler. The call paths that could trigger bugs are shown below:
(interrupt context 1)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_skb_alloc(..., GFP_KERNEL) //may sleep
(interrupt context 2)
st_nci_se_wt_timeout
nci_hci_send_event
nci_hci_send_data
nci_send_data
nci_queue_tx_data_frags
nci_skb_alloc(..., GFP_KERNEL) //may sleep
This patch changes allocation mode of nci_skb_alloc from GFP_KERNEL to
GFP_ATOMIC in order to prevent atomic context sleeping. The GFP_ATOMIC
flag makes memory allocation operation could be used in atomic context.
Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci")
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20220517012530.75714-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
test_bit() tests if one bit is set or not.
Here the logic seems to check of bit QL_RESET_PER_SCSI (i.e. 4) OR bit
QL_RESET_START (i.e. 3) is set.
In fact, it checks if bit 7 (4 | 3 = 7) is set, that is to say
QL_ADAPTER_UP.
This looks harmless, because this bit is likely be set, and when the
ql_reset_work() delayed work is scheduled in ql3xxx_isr() (the only place
that schedule this work), QL_RESET_START or QL_RESET_PER_SCSI is set.
This has been spotted by smatch.
Fixes: 5a4faa873782 ("[PATCH] qla3xxx NIC driver")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/80e73e33f390001d9c0140ffa9baddf6466a41a2.1652637337.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- Avoid putting Elo i2 PCIe Ports in D3cold because downstream devices
are inaccessible after going back to D0 (Rafael J. Wysocki)
- Qualcomm SM8250 has a ddrss_sf_tbu clock but SC8180X does not; make a
SC8180X-specific config without the clock so it probes correctly
(Bjorn Andersson)
- Revert aardvark chained IRQ handler rewrite because it broke
interrupt affinity (Pali Rohár)
* tag 'pci-v5.18-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: aardvark: Rewrite IRQ code to chained IRQ handler"
PCI: qcom: Remove ddrss_sf_tbu clock from SC8180X
PCI/PM: Avoid putting Elo i2 PCIe Ports in D3cold
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Fix up a recent change in the int340x thermal driver that
inadvertently broke thermal zone handling on some systems
(Srinivas Pandruvada)"
* tag 'thermal-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: int340x: Mode setting with new OS handshake
|
|
The code attempts to free the 'new' pointer using kmem_cache_free(),
which is wrong because this function isn't responsible of freeing it.
Instead, the function should free new->htable and clear the contents of
*new (to prevent double-free).
Cc: stable@vger.kernel.org
Fixes: c7c556f1e81b ("selinux: refactor changing booleans")
Reported-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
drm_dp_mst_get_edid call kmemdup to create mst_edid. So mst_edid need to be
freed after use.
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220516032042.13166-1-hbh25y@gmail.com
|
|
This change fixes the following:
1) The flags variable is not initialized. Always use raw_spin_lock_irqsave
and raw_spin_unlock_irqrestore to serialize patching.
2) flush_kernel_vmap_range is primarily intended for DMA flushes.
The whole cache flush in flush_kernel_vmap_range is only possible
when interrupts are enabled on SMP machines. Since __patch_text_multiple
calls flush_kernel_vmap_range with interrupts disabled, it is better
to directly call flush_kernel_dcache_range_asm and
flush_kernel_icache_range_asm.
3) The final call to flush_icache_range is unnecessary.
Tested with `[PATCH, V3] parisc: Rewrite cache flush code for
PA8800/PA8900' change on rp3440, c8000 and c3750 (32 and 64-bit).
Note by Helge:
This patch had been temporarily reverted shortly before v5.18-rc6 in order
to fix boot issues. Now it can be re-applied.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Originally, I was convinced that we needed to use tmpalias flushes
everwhere, for both user and kernel flushes. However, when I modified
flush_kernel_dcache_page_addr, to use a tmpalias flush, my c8000
would crash quite early when booting.
The PDC returns alias values of 0 for the icache and dcache. This
indicates that either the alias boundary is greater than 16MB or
equivalent aliasing doesn't work. I modified the tmpalias code to
make it easy to try alternate boundaries. I tried boundaries up to
128MB but still kernel tmpalias flushes didn't work on c8000.
This led me to conclude that tmpalias flushes don't work on PA8800
and PA8900 machines, and that we needed to flush directly using the
virtual address of user and kernel pages. This is likely the major
cause of instability on the c8000 and rp34xx machines.
Flushing user pages requires doing a temporary context switch as we
have to flush pages that don't belong to the current context. Further,
we have to deal with pages that aren't present. If a page isn't
present, the flush instructions fault on every line.
Other code has been rearranged and simplified based on testing. For
example, I introduced a flush_cache_dup_mm routine. flush_cache_mm
and flush_cache_dup_mm differ in that flush_cache_mm calls
purge_cache_pages and flush_cache_dup_mm calls flush_cache_pages.
In some implementations, pdc is more efficient than fdc. Based on
my testing, I don't believe there's any performance benefit on the
c8000.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cache_page_if_present.
The pte value extracted for the "from" page in copy_user_highpage is racy and
occasionally the pte is cleared before the flush is complete. I assume that
the page is simultaneously flushed by flush_cache_mm before the pte is cleared
as nullifying the fdc doesn't seem to cause problems.
I investigated various locking scenarios but I wasn't able to find a way to
sequence the flushes. This code is called for every COW break and locks impact
performance.
This patch is related to the bigger cache flush patch because we need the pte
on PA8800/PA8900 to flush using the vma context.
I have also seen this from copy_to_user_page and copy_from_user_page.
The messages appear infrequently when enabled.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
clk_generated_best_diff() helps in finding the parent and the divisor to
compute a rate closest to the required one. However, it doesn't take into
account the request's range for the new rate. Make sure the new rate
is within the required range.
Fixes: 8a8f4bf0c480 ("clk: at91: clk-generated: create function to find best_diff")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Link: https://lore.kernel.org/r/20220413071318.244912-1-codrin.ciubotariu@microchip.com
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <ju.orth@gmail.com>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
We gate whether to IOPOLL for a request on whether the opcode is allowed
on a ring setup for IOPOLL and if it's got a file assigned. MSG_RING
is the only one that allows a file yet isn't pollable, it's merely
supported to allow communication on an IOPOLL ring, not because we can
poll for completion of it.
Put the assigned file early and clear it, so we don't attempt to poll
for it.
Reported-by: syzbot+1a0a53300ce782f8b3ad@syzkaller.appspotmail.com
Fixes: 3f1d52abf098 ("io_uring: defer msg-ring file validity check until command issue")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Adaptive-rx and Adaptive-tx are interrupt moderation settings
that can be enabled/disabled using ethtool:
ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
Unfortunately those settings are getting cleared after
changing number of queues, or in ethtool world 'channels':
ethtool -L ethX rx 1 tx 1
Clearing was happening due to introduction of bit fields
in ice_ring_container struct. This way only itr_setting
bits were rebuilt during ice_vsi_rebuild_set_coalesce().
Introduce an anonymous struct of bitfields and create a
union to refer to them as a single variable.
This way variable can be easily saved and restored.
Fixes: 61dc79ced7aa ("ice: Restore interrupt throttle settings after VSI rebuild")
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The hardware statistics counters are not cleared during resets so the
drivers first access is to initialize the baseline and then subsequent
reads are for reporting the counters. The statistics counters are read
during the watchdog subtask when the interface is up. If the baseline
is not initialized before the interface is up, then there can be a brief
window in which some traffic can be transmitted/received before the
initial baseline reading takes place.
Directly initialize ethtool statistics in driver open so the baseline will
be initialized when the interface is up, and any dropped packets
incremented before the interface is up won't be reported.
Fixes: 28dc1b86f8ea9 ("ice: ignore dropped packets during init")
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Do not allow to write timestamps on RX rings if PF is being configured.
When PF is being configured RX rings can be freed or rebuilt. If at the
same time timestamps are updated, the kernel will crash by dereferencing
null RX ring pointer.
PID: 1449 TASK: ff187d28ed658040 CPU: 34 COMMAND: "ice-ptp-0000:51"
#0 [ff1966a94a713bb0] machine_kexec at ffffffff9d05a0be
#1 [ff1966a94a713c08] __crash_kexec at ffffffff9d192e9d
#2 [ff1966a94a713cd0] crash_kexec at ffffffff9d1941bd
#3 [ff1966a94a713ce8] oops_end at ffffffff9d01bd54
#4 [ff1966a94a713d08] no_context at ffffffff9d06bda4
#5 [ff1966a94a713d60] __bad_area_nosemaphore at ffffffff9d06c10c
#6 [ff1966a94a713da8] do_page_fault at ffffffff9d06cae4
#7 [ff1966a94a713de0] page_fault at ffffffff9da0107e
[exception RIP: ice_ptp_update_cached_phctime+91]
RIP: ffffffffc076db8b RSP: ff1966a94a713e98 RFLAGS: 00010246
RAX: 16e3db9c6b7ccae4 RBX: ff187d269dd3c180 RCX: ff187d269cd4d018
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ff187d269cfcc644 R8: ff187d339b9641b0 R9: 0000000000000000
R10: 0000000000000002 R11: 0000000000000000 R12: ff187d269cfcc648
R13: ffffffff9f128784 R14: ffffffff9d101b70 R15: ff187d269cfcc640
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff1966a94a713ea0] ice_ptp_periodic_work at ffffffffc076dbef [ice]
#9 [ff1966a94a713ee0] kthread_worker_fn at ffffffff9d101c1b
#10 [ff1966a94a713f10] kthread at ffffffff9d101b4d
#11 [ff1966a94a713f50] ret_from_fork at ffffffff9da0023f
Fixes: 77a781155a65 ("ice: enable receive hardware timestamping")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Dave Cain <dcain@redhat.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.18, take #3
- Correctly expose GICv3 support even if no irqchip is created
so that userspace doesn't observe it changing pointlessly
(fixing a regression with QEMU)
- Don't issue a hypercall to set the id-mapped vectors when
protected mode is enabled (fix for pKVM in combination with
CPUs affected by Spectre-v3a)
|
|
Add a basic stat test.
Add two tests of grouping behavior for topdown events. Topdown events
are special as they must be grouped with the slots event first.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220517052724.283874-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
On Intel Icelake, topdown events must always be grouped with a slots
event as leader. When a metric is parsed a weak group is formed and
retried if perf_event_open fails. The retried events aren't grouped
breaking the slots leader requirement. This change modifies the weak
group "reset" behavior so that topdown events aren't broken from the
group for the retry.
$ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1
Performance counter stats for 'system wide':
47,867,188,483 slots (92.27%)
<not supported> topdown-bad-spec
<not supported> topdown-be-bound
<not supported> topdown-fe-bound
<not supported> topdown-retiring
2,173,346,937 branch-instructions (92.27%)
10,540,253 branch-misses # 0.48% of all branches (92.29%)
96,291,140 bus-cycles (92.29%)
6,214,202 cache-misses # 20.120 % of all cache refs (92.29%)
30,886,082 cache-references (76.91%)
11,773,726,641 cpu-cycles (84.62%)
11,807,585,307 instructions # 1.00 insn per cycle (92.31%)
0 mem-loads (92.32%)
2,212,928,573 mem-stores (84.69%)
10,024,403,118 ref-cycles (92.35%)
16,232,978 baclears.any (92.35%)
23,832,633 ARITH.DIVIDER_ACTIVE (84.59%)
0.981070734 seconds time elapsed
After:
$ perf stat -e '{slots,topdown-bad-spec,topdown-be-bound,topdown-fe-bound,topdown-retiring,branch-instructions,branch-misses,bus-cycles,cache-misses,cache-references,cpu-cycles,instructions,mem-loads,mem-stores,ref-cycles,baclears.any,ARITH.DIVIDER_ACTIVE}:W' -a sleep 1
Performance counter stats for 'system wide':
31040189283 slots (92.27%)
8997514811 topdown-bad-spec # 28.2% bad speculation (92.27%)
10997536028 topdown-be-bound # 34.5% backend bound (92.27%)
4778060526 topdown-fe-bound # 15.0% frontend bound (92.27%)
7086628768 topdown-retiring # 22.2% retiring (92.27%)
1417611942 branch-instructions (92.26%)
5285529 branch-misses # 0.37% of all branches (92.28%)
62922469 bus-cycles (92.29%)
1440708 cache-misses # 8.292 % of all cache refs (92.30%)
17374098 cache-references (76.94%)
8040889520 cpu-cycles (84.63%)
7709992319 instructions # 0.96 insn per cycle (92.32%)
0 mem-loads (92.32%)
1515669558 mem-stores (84.68%)
6542411177 ref-cycles (92.35%)
4154149 baclears.any (92.35%)
20556152 ARITH.DIVIDER_ACTIVE (84.59%)
1.010799593 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220517052724.283874-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
it is ASCII
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.
To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.
Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:
$ echo -n "Hello" | od -t x8
0000000 0000006f6c6c6548
0000005
$ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=ew intel-pt-events.py
Intel PT Branch Trace, Power Events, Event Trace and PTWRITE
Switch In 38524/38524 [001] 24166.044995916 0/0
eg_ptw 38524/38524 [001] 24166.045380004 ptwrite jmp IP: 0 payload: 0x6f6c6c6548 Hello 56532c7ce196 perf_emulate_ptwrite+0x16 (/home/ahunter/git/work/eg_ptw)
End
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
It can be convenient to put a string value into a ptwrite payload as
a quick and easy way to identify what is being printed.
To make that useful, if the Intel ptwrite payload value contains only
printable ASCII characters padded with NULLs, then print it also as a
string.
Using the example program from the "Emulated PTWRITE" section of
tools/perf/Documentation/perf-intel-pt.txt:
$ echo -n "Hello" | od -t x8
0000000 0000006f6c6c6548
0000005
$ perf record -e intel_pt//u ./eg_ptw 0x0000006f6c6c6548
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
$ perf script --itrace=ew
eg_ptw 35563 [005] 18256.087338: ptwrite: IP: 0 payload: 0x6f6c6c6548 Hello 55e764db5196 perf_emulate_ptwrite+0x16 (/home/user/eg_ptw)
$
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
ptwrite is an Intel x86 instruction that writes arbitrary values into an
Intel PT trace. It is not supported on all hardware, so provide an
alternative that makes use of TNT packets to convey the payload data.
TNT packets encode Taken/Not-taken conditional branch information, so
taking branches based on the payload value will encode the value into
the TNT packet. Refer to the changes to the documentation file
perf-intel-pt.txt in this patch for an example.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220509152400.376613-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
As an optimisation, only pages mapped with PROT_MTE in user space have
the MTE tags zeroed. This is done lazily at the set_pte_at() time via
mte_sync_tags(). However, this function is missing a barrier and another
CPU may see the PTE updated before the zeroed tags are visible. Add an
smp_wmb() barrier if the mapping is Normal Tagged.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 34bfeea4a9e9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Cc: <stable@vger.kernel.org> # 5.10.x
Reported-by: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Link: https://lore.kernel.org/r/20220517093532.127095-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In arm64_relocate_new_kernel() we load some fields out of the kimage
structure after relocation has occurred. As the kimage structure isn't
allocated to be relocation-safe, it may be clobbered during relocation,
and we may load junk values out of the structure.
Due to this, kexec may fail when the kimage allocation happens to fall
within a PA range that an object will be relocated to. This has been
observed to occur for regular kexec on a QEMU TCG 'virt' machine with
2GiB of RAM, where the PA range of the new kernel image overlaps the
kimage structure.
Avoid this by ensuring we load all values from the kimage structure
prior to relocation.
I've tested this atop v5.16 and v5.18-rc6.
Fixes: 878fdbd70486 ("arm64: kexec: pass kimage as the only argument to relocation function")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20220516160735.731404-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
During hotplug, the stolen time data structure is unmapped and memset.
There is a possibility of the timer IRQ being triggered before memset
and stolen time is getting updated as part of this timer IRQ handler. This
causes the below crash in timer handler -
[ 3457.473139][ C5] Unable to handle kernel paging request at virtual address ffffffc03df05148
...
[ 3458.154398][ C5] Call trace:
[ 3458.157648][ C5] para_steal_clock+0x30/0x50
[ 3458.162319][ C5] irqtime_account_process_tick+0x30/0x194
[ 3458.168148][ C5] account_process_tick+0x3c/0x280
[ 3458.173274][ C5] update_process_times+0x5c/0xf4
[ 3458.178311][ C5] tick_sched_timer+0x180/0x384
[ 3458.183164][ C5] __run_hrtimer+0x160/0x57c
[ 3458.187744][ C5] hrtimer_interrupt+0x258/0x684
[ 3458.192698][ C5] arch_timer_handler_virt+0x5c/0xa0
[ 3458.198002][ C5] handle_percpu_devid_irq+0xdc/0x414
[ 3458.203385][ C5] handle_domain_irq+0xa8/0x168
[ 3458.208241][ C5] gic_handle_irq.34493+0x54/0x244
[ 3458.213359][ C5] call_on_irq_stack+0x40/0x70
[ 3458.218125][ C5] do_interrupt_handler+0x60/0x9c
[ 3458.223156][ C5] el1_interrupt+0x34/0x64
[ 3458.227560][ C5] el1h_64_irq_handler+0x1c/0x2c
[ 3458.232503][ C5] el1h_64_irq+0x7c/0x80
[ 3458.236736][ C5] free_vmap_area_noflush+0x108/0x39c
[ 3458.242126][ C5] remove_vm_area+0xbc/0x118
[ 3458.246714][ C5] vm_remove_mappings+0x48/0x2a4
[ 3458.251656][ C5] __vunmap+0x154/0x278
[ 3458.255796][ C5] stolen_time_cpu_down_prepare+0xc0/0xd8
[ 3458.261542][ C5] cpuhp_invoke_callback+0x248/0xc34
[ 3458.266842][ C5] cpuhp_thread_fun+0x1c4/0x248
[ 3458.271696][ C5] smpboot_thread_fn+0x1b0/0x400
[ 3458.276638][ C5] kthread+0x17c/0x1e0
[ 3458.280691][ C5] ret_from_fork+0x10/0x20
As a fix, introduce rcu lock to update stolen time structure.
Fixes: 75df529bec91 ("arm64: paravirt: Initialize steal time when cpu is online")
Cc: stable@vger.kernel.org
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
Reviewed-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Link: https://lore.kernel.org/r/20220513174654.362169-1-quic_eberman@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The typedefs u32 and u64 are not available in userspace. Thus user get
an error he try to use DMA_BUF_SET_NAME_A or DMA_BUF_SET_NAME_B:
$ gcc -Wall -c -MMD -c -o ioctls_list.o ioctls_list.c
In file included from /usr/include/x86_64-linux-gnu/asm/ioctl.h:1,
from /usr/include/linux/ioctl.h:5,
from /usr/include/asm-generic/ioctls.h:5,
from ioctls_list.c:11:
ioctls_list.c:463:29: error: ‘u32’ undeclared here (not in a function)
463 | { "DMA_BUF_SET_NAME_A", DMA_BUF_SET_NAME_A, -1, -1 }, // linux/dma-buf.h
| ^~~~~~~~~~~~~~~~~~
ioctls_list.c:464:29: error: ‘u64’ undeclared here (not in a function)
464 | { "DMA_BUF_SET_NAME_B", DMA_BUF_SET_NAME_B, -1, -1 }, // linux/dma-buf.h
| ^~~~~~~~~~~~~~~~~~
The issue was initially reported here[1].
[1]: https://github.com/jerome-pouiller/ioctl/pull/14
Signed-off-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Fixes: a5bff92eaac4 ("dma-buf: Fix SET_NAME ioctl uapi")
CC: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220517072708.245265-1-Jerome.Pouiller@silabs.com
Signed-off-by: Christian König <christian.koenig@amd.com>
|
|
In vmxnet3_rq_create(), when dma_alloc_coherent() fails,
vmxnet3_rq_destroy() is called. It sets rq->rx_ring[i].base to NULL. Then
vmxnet3_rq_create() returns an error to its callers mxnet3_rq_create_all()
-> vmxnet3_change_mtu(). Then vmxnet3_change_mtu() calls
vmxnet3_force_close() -> dev_close() in error handling code. And the driver
calls vmxnet3_close() -> vmxnet3_quiesce_dev() -> vmxnet3_rq_cleanup_all()
-> vmxnet3_rq_cleanup(). In vmxnet3_rq_cleanup(),
rq->rx_ring[ring_idx].base is accessed, but this variable is NULL, causing
a NULL pointer dereference.
To fix this possible bug, an if statement is added to check whether
rq->rx_ring[0].base is NULL in vmxnet3_rq_cleanup() and exit early if so.
The error log in our fault-injection testing is shown as follows:
[ 65.220135] BUG: kernel NULL pointer dereference, address: 0000000000000008
...
[ 65.222633] RIP: 0010:vmxnet3_rq_cleanup_all+0x396/0x4e0 [vmxnet3]
...
[ 65.227977] Call Trace:
...
[ 65.228262] vmxnet3_quiesce_dev+0x80f/0x8a0 [vmxnet3]
[ 65.228580] vmxnet3_close+0x2c4/0x3f0 [vmxnet3]
[ 65.228866] __dev_close_many+0x288/0x350
[ 65.229607] dev_close_many+0xa4/0x480
[ 65.231124] dev_close+0x138/0x230
[ 65.231933] vmxnet3_force_close+0x1f0/0x240 [vmxnet3]
[ 65.232248] vmxnet3_change_mtu+0x75d/0x920 [vmxnet3]
...
Fixes: d1a890fa37f27 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050711.2636709-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
In vmxnet3_rq_alloc_rx_buf(), when dma_map_single() fails, rbi->skb is
freed immediately. Similarly, in another branch, when dma_map_page() fails,
rbi->page is also freed. In the two cases, vmxnet3_rq_alloc_rx_buf()
returns an error to its callers vmxnet3_rq_init() -> vmxnet3_rq_init_all()
-> vmxnet3_activate_dev(). Then vmxnet3_activate_dev() calls
vmxnet3_rq_cleanup_all() in error handling code, and rbi->skb or rbi->page
are freed again in vmxnet3_rq_cleanup_all(), causing use-after-free bugs.
To fix these possible bugs, rbi->skb and rbi->page should be cleared after
they are freed.
The error log in our fault-injection testing is shown as follows:
[ 14.319016] BUG: KASAN: use-after-free in consume_skb+0x2f/0x150
...
[ 14.321586] Call Trace:
...
[ 14.325357] consume_skb+0x2f/0x150
[ 14.325671] vmxnet3_rq_cleanup_all+0x33a/0x4e0 [vmxnet3]
[ 14.326150] vmxnet3_activate_dev+0xb9d/0x2ca0 [vmxnet3]
[ 14.326616] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.361675] Allocated by task 351:
...
[ 14.362688] __netdev_alloc_skb+0x1b3/0x6f0
[ 14.362960] vmxnet3_rq_alloc_rx_buf+0x1b0/0x8d0 [vmxnet3]
[ 14.363317] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.363661] vmxnet3_open+0x387/0x470 [vmxnet3]
...
[ 14.367309]
[ 14.367412] Freed by task 351:
...
[ 14.368932] __dev_kfree_skb_any+0xd2/0xe0
[ 14.369193] vmxnet3_rq_alloc_rx_buf+0x71e/0x8d0 [vmxnet3]
[ 14.369544] vmxnet3_activate_dev+0x3e3/0x2ca0 [vmxnet3]
[ 14.369883] vmxnet3_open+0x387/0x470 [vmxnet3]
[ 14.370174] __dev_open+0x28a/0x420
[ 14.370399] __dev_change_flags+0x192/0x590
[ 14.370667] dev_change_flags+0x7a/0x180
[ 14.370919] do_setlink+0xb28/0x3570
[ 14.371150] rtnl_newlink+0x1160/0x1740
[ 14.371399] rtnetlink_rcv_msg+0x5bf/0xa50
[ 14.371661] netlink_rcv_skb+0x1cd/0x3e0
[ 14.371913] netlink_unicast+0x5dc/0x840
[ 14.372169] netlink_sendmsg+0x856/0xc40
[ 14.372420] ____sys_sendmsg+0x8a7/0x8d0
[ 14.372673] __sys_sendmsg+0x1c2/0x270
[ 14.372914] do_syscall_64+0x41/0x90
[ 14.373145] entry_SYSCALL_64_after_hwframe+0x44/0xae
...
Fixes: 5738a09d58d5a ("vmxnet3: fix checks for dma mapping errors")
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Signed-off-by: Zixuan Fu <r33s3n6@gmail.com>
Link: https://lore.kernel.org/r/20220514050656.2636588-1-r33s3n6@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The global blackhole_netdev has replaced pernet loopback_dev to become the
one given to the object that holds an netdev when ifdown in many places of
ipv4 and ipv6 since commit 8d7017fd621d ("blackhole_netdev: use
blackhole_netdev to invalidate dst entries").
Especially after commit faab39f63c1f ("net: allow out-of-order netdev
unregistration"), it's no longer safe to use loopback_dev that may be
freed before other netdev.
This patch is to set dst dev to blackhole_netdev instead of loopback_dev
in ifdown.
v1->v2:
- add Fixes tag as Eric suggested.
Fixes: faab39f63c1f ("net: allow out-of-order netdev unregistration")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/e8c87482998ca6fcdab214f5a9d582899ec0c648.1652665047.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
if devm_clk_get_optional() fails, we still need to go through the error
handling path.
Add the missing goto.
Fixes: 6328a126896ea ("net: systemport: Manage Wake-on-LAN clock")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/99d70634a81c229885ae9e4ee69b2035749f7edc.1652634040.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
The following two scenarios were failing for lan966x.
1. If the port had the address X and then trying to assign the same
address, then the HW was just removing this address because first it
tries to learn new address and then delete the old one. As they are
the same the HW remove it.
2. If the port eth0 was assigned the same address as one of the other
ports eth1 then when assigning back the address to eth0 then the HW
was deleting the address of eth1.
The case 1. is fixed by checking if the port has already the same
address while case 2. is fixed by checking if the address is used by any
other port.
Fixes: e18aba8941b40b ("net: lan966x: add mactable support")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20220513180030.3076793-1-horatiu.vultur@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This reverts commit 1738890a3165ccd0da98ebd3e2d5f9b230d5afa8.
Commit 1738890a3165 ("clk: sunxi-ng: sun6i-rtc: Add support for H6")
breaks HDMI output on Tanix TX6 mini board. Exact reason isn't known,
but because that commit doesn't actually improve anything, let's just
revert it.
Cc: stable@vger.kernel.org
Signed-off-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Link: https://lore.kernel.org/r/20220511200206.2458274-1-jernej.skrabec@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The commit 09e3b18ca5de ("clk: bcm2835: Remove unused variable")
accidentially breaks the behavior of bcm2835_clock_choose_div() and
booting of Raspberry Pi. The removed do_div macro call had side effects,
so we need to restore it.
Fixes: 09e3b18ca5de ("clk: bcm2835: Remove unused variable")
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Link: https://lore.kernel.org/r/20220428183010.1635248-1-stefan.wahren@i2se.com
Tested-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Cast pointers to unsigned long instead of to uint64_t to avoid this
problem on 32-bit arches:
31 6.89 debian:experimental-x-mips : FAIL gcc version 11.2.0 (Debian 11.2.0-18)
bench/breakpoint.c: In function 'breakpoint_setup':
bench/breakpoint.c:56:24: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
56 | attr.bp_addr = (uint64_t)addr;
| ^
cc1: all warnings being treated as errors
make[3]: *** [/git/perf-5.18.0-rc7/tools/build/Makefile.build:139: bench] Error 2
Fixes: 68a6772f11dbb1ed ("perf bench: Add breakpoint benchmarks")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/YoLq1nHx1doi+VWl@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|