summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-05-23tracing/user_events: Split up mm alloc and attachLinus Torvalds
When a new mm is being created in a fork() path it currently is allocated and then attached in one go. This leaves the mm exposed out to the tracing register callbacks while any parent enabler locations are copied in. This should not happen. Split up mm alloc and attach as unique operations. When duplicating enablers, first alloc, then duplicate, and only upon success, attach. This prevents any timing window outside of the event_reg mutex for enablement walking. This allows for dropping RCU requirement for enablement walking in later patches. Link: https://lkml.kernel.org/r/20230519230741.669-2-beaub@linux.microsoft.com Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=whTBvXJuoi_kACo3qi5WZUmRrhyA-_=rRFsycTytmB6qw@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [ change log written by Beau Belgrave ] Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-24tpm: tpm_tis: Disable interrupts for AEON UPX-i11Peter Ujfalusi
Interrupts got recently enabled for tpm_tis. The interrupts initially works on the device but they will stop arriving after circa ~200 interrupts. On system reboot/shutdown this will cause a long wait (120000 jiffies). [jarkko@kernel.org: fix a merge conflict and adjust the commit message] Fixes: e644b2f498d2 ("tpm, tpm_tis: Enable interrupt test") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
2023-05-23Merge tag 'xtensa-20230523' of https://github.com/jcmvbkbc/linux-xtensaLinus Torvalds
Pull Xtensa fixes from Max Filippov: - fix signal delivery to FDPIC process - add __bswap{si,di}2 helpers * tag 'xtensa-20230523' of https://github.com/jcmvbkbc/linux-xtensa: xtensa: add __bswap{si,di}2 helpers xtensa: fix signal delivery to FDPIC process
2023-05-23ASoC: dwc: move DMA init to snd_soc_dai_driver probe()Maxim Kochetkov
When using DMA mode we are facing with Oops: [ 396.458157] Unable to handle kernel access to user memory without uaccess routines at virtual address 000000000000000c [ 396.469374] Oops [#1] [ 396.471839] Modules linked in: [ 396.475144] CPU: 0 PID: 114 Comm: arecord Not tainted 6.0.0-00164-g9a8eccdaf2be-dirty #68 [ 396.483619] Hardware name: YMP ELCT FPGA (DT) [ 396.488156] epc : dmaengine_pcm_open+0x1d2/0x342 [ 396.493227] ra : dmaengine_pcm_open+0x1d2/0x342 [ 396.498140] epc : ffffffff807fe346 ra : ffffffff807fe346 sp : ffffffc804e138f0 [ 396.505602] gp : ffffffff817bf730 tp : ffffffd8042c8ac0 t0 : 6500000000000000 [ 396.513045] t1 : 0000000000000064 t2 : 656e69676e65616d s0 : ffffffc804e13990 [ 396.520477] s1 : ffffffd801b86a18 a0 : 0000000000000026 a1 : ffffffff816920f8 [ 396.527897] a2 : 0000000000000010 a3 : fffffffffffffffe a4 : 0000000000000000 [ 396.535319] a5 : 0000000000000000 a6 : ffffffd801b87040 a7 : 0000000000000038 [ 396.542740] s2 : ffffffd801b94a00 s3 : 0000000000000000 s4 : ffffffd80427f5e8 [ 396.550153] s5 : ffffffd80427f5e8 s6 : ffffffd801b44410 s7 : fffffffffffffff5 [ 396.557569] s8 : 0000000000000800 s9 : 0000000000000001 s10: ffffffff8066d254 [ 396.564978] s11: ffffffd8059cf768 t3 : ffffffff817d5577 t4 : ffffffff817d5577 [ 396.572391] t5 : ffffffff817d5578 t6 : ffffffc804e136e8 [ 396.577876] status: 0000000200000120 badaddr: 000000000000000c cause: 000000000000000d [ 396.586007] [<ffffffff806839f4>] snd_soc_component_open+0x1a/0x68 [ 396.592439] [<ffffffff807fdd62>] __soc_pcm_open+0xf0/0x502 [ 396.598217] [<ffffffff80685d86>] soc_pcm_open+0x2e/0x4e [ 396.603741] [<ffffffff8066cea4>] snd_pcm_open_substream+0x442/0x68e [ 396.610313] [<ffffffff8066d1ea>] snd_pcm_open+0xfa/0x212 [ 396.615868] [<ffffffff8066d39c>] snd_pcm_capture_open+0x3a/0x60 [ 396.622048] [<ffffffff8065b35a>] snd_open+0xa8/0x17a [ 396.627421] [<ffffffff801ae036>] chrdev_open+0xa0/0x218 [ 396.632893] [<ffffffff801a5a28>] do_dentry_open+0x17c/0x2a6 [ 396.638713] [<ffffffff801a6d9a>] vfs_open+0x1e/0x26 [ 396.643850] [<ffffffff801b8544>] path_openat+0x96e/0xc96 [ 396.649518] [<ffffffff801b9390>] do_filp_open+0x7c/0xf6 [ 396.655034] [<ffffffff801a6ff2>] do_sys_openat2+0x8a/0x11e [ 396.660765] [<ffffffff801a735a>] sys_openat+0x50/0x7c [ 396.666068] [<ffffffff80003aca>] ret_from_syscall+0x0/0x2 [ 396.674964] ---[ end trace 0000000000000000 ]--- It happens because of play_dma_data/capture_dma_data pointers are NULL. Current implementation assigns these pointers at snd_soc_dai_driver startup() callback and reset them back to NULL at shutdown(). But soc_pcm_open() sequence uses DMA pointers in dmaengine_pcm_open() before snd_soc_dai_driver startup(). Most generic DMA capable I2S drivers use snd_soc_dai_driver probe() callback to init DMA pointers only once at probe. So move DMA init to dw_i2s_dai_probe and drop shutdown() and startup() callbacks. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Link: https://lore.kernel.org/r/20230512110343.66664-1-fido_max@inbox.ru Signed-off-by: Mark Brown <broonie@kernel.org>
2023-05-23cifs: Fix cifs_limit_bvec_subset() to correctly check the maxmimum sizeDavid Howells
Fix cifs_limit_bvec_subset() so that it limits the span to the maximum specified and won't return with a size greater than max_size. Fixes: d08089f649a0 ("cifs: Change the I/O paths to use an iterator rather than a page list") Cc: stable@vger.kernel.org # 6.3 Reported-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Steve French <smfrench@gmail.com> cc: Rohith Surabattula <rohiths.msft@gmail.com> cc: Paulo Alcantara <pc@manguebit.com> cc: Tom Talpey <tom@talpey.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2023-05-23vfio/type1: check pfn valid before converting to struct pageYan Zhao
Check physical PFN is valid before converting the PFN to a struct page pointer to be returned to caller of vfio_pin_pages(). vfio_pin_pages() pins user pages with contiguous IOVA. If the IOVA of a user page to be pinned belongs to vma of vm_flags VM_PFNMAP, pin_user_pages_remote() will return -EFAULT without returning struct page address for this PFN. This is because usually this kind of PFN (e.g. MMIO PFN) has no valid struct page address associated. Upon this error, vaddr_get_pfns() will obtain the physical PFN directly. While previously vfio_pin_pages() returns to caller PFN arrays directly, after commit 34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()"), PFNs will be converted to "struct page *" unconditionally and therefore the returned "struct page *" array may contain invalid struct page addresses. Given current in-tree users of vfio_pin_pages() only expect "struct page * returned, check PFN validity and return -EINVAL to let the caller be aware of IOVAs to be pinned containing PFN not able to be returned in "struct page *" array. So that, the caller will not consume the returned pointer (e.g. test PageReserved()) and avoid error like "supervisor read access in kernel mode". Fixes: 34a255e67615 ("vfio: Replace phys_pfn with pages for vfio_pin_pages()") Cc: Sean Christopherson <seanjc@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20230519065843.10653-1-yan.y.zhao@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-05-23ASoC: cs35l41: Fix default regmap values for some registersStefan Binding
Several values do not match the defaults of CS35L41, fix them. Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com> Acked-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20230414152552.574502-4-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown <broonie@kernel.org>
2023-05-23Merge tag 'erofs-for-6.4-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "One patch addresses a null-ptr-deref issue reported by syzbot weeks ago, which is caused by the new long xattr name prefix feature and needs to be fixed. The remaining two patches are minor cleanups to avoid unnecessary compilation and adjust per-cpu kworker configuration. Summary: - Fix null-ptr-deref related to long xattr name prefixes - Avoid pcpubuf compilation if CONFIG_EROFS_FS_ZIP is off - Use high priority kthreads by default if per-cpu kthread workers are enabled" * tag 'erofs-for-6.4-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: use HIPRI by default if per-cpu kthreads are enabled erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is off erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_init
2023-05-23block: fix bio-cache for passthru IOAnuj Gupta
commit <8af870aa5b847> ("block: enable bio caching use for passthru IO") introduced bio-cache for passthru IO. In case when nr_vecs are greater than BIO_INLINE_VECS, bio and bvecs are allocated from mempool (instead of percpu cache) and REQ_ALLOC_CACHE is cleared. This causes the side effect of not freeing bio/bvecs into mempool on completion. This patch lets the passthru IO fallback to allocation using bio_kmalloc when nr_vecs are greater than BIO_INLINE_VECS. The corresponding bio is freed during call to blk_mq_map_bio_put during completion. Cc: stable@vger.kernel.org # 6.1 fixes <8af870aa5b847> ("block: enable bio caching use for passthru IO") Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20230523111709.145676-1-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-23block, bfq: update Paolo's address in maintainer listPaolo Valente
Current email address of Paolo Valente is no longer valid, use a good one. Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20230523091724.26636-1-paolo.valente@unimore.it Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-23blk-mq: fix race condition in active queue accountingTian Lan
If multiple CPUs are sharing the same hardware queue, it can cause leak in the active queue counter tracking when __blk_mq_tag_busy() is executed simultaneously. Fixes: ee78ec1077d3 ("blk-mq: blk_mq_tag_busy is no need to return a value") Signed-off-by: Tian Lan <tian.lan@twosigma.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20230522210555.794134-1-tilan7663@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-23blk-wbt: fix that wbt can't be disabled by defaultYu Kuai
commit b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()") removes the checking of CONFIG_BLK_WBT_MQ by mistake, which is used to control enable or disable wbt by default. Fix the problem by adding back the checking. This patch also do a litter cleanup to make related code more readable. Fixes: b11d31ae01e6 ("blk-wbt: remove unnecessary check in wbt_enable_default()") Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/lkml/CAKXUXMzfKq_J9nKHGyr5P5rvUETY4B-fxoQD4sO+NYjFOfVtZA@mail.gmail.com/t/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230522121854.2928880-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-05-23parisc: Use num_present_cpus() in alternative patching codeHelge Deller
When patching the kernel code some alternatives depend on SMP vs. !SMP. Use the value of num_present_cpus() instead of num_online_cpus() to decide, otherwise we may run into issues if and additional CPU is enabled after having loaded a module while only one CPU was enabled. Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.1+
2023-05-23tracing/timerlat: Always wakeup the timerlat threadDaniel Bristot de Oliveira
While testing rtla timerlat auto analysis, I reach a condition where the interface was not receiving tracing data. I was able to manually reproduce the problem with these steps: # echo 0 > tracing_on # disable trace # echo 1 > osnoise/stop_tracing_us # stop trace if timerlat irq > 1 us # echo timerlat > current_tracer # enable timerlat tracer # sleep 1 # wait... that is the time when rtla # apply configs like prio or cgroup # echo 1 > tracing_on # start tracing # cat trace # tracer: timerlat # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # ||||| ACTIVATION # TASK-PID CPU# ||||| TIMESTAMP ID CONTEXT LATENCY # | | | ||||| | | | | NOTHING! Then, trying to enable tracing again with echo 1 > tracing_on resulted in no change: the trace was still not tracing. This problem happens because the timerlat IRQ hits the stop tracing condition while tracing is off, and do not wake up the timerlat thread, so the timerlat threads are kept sleeping forever, resulting in no trace, even after re-enabling the tracer. Avoid this condition by always waking up the threads, even after stopping tracing, allowing the tracer to return to its normal operating after a new tracing on. Link: https://lore.kernel.org/linux-trace-kernel/1ed8f830638b20a39d535d27d908e319a9a3c4e2.1683822622.git.bristot@kernel.org Cc: Juri Lelli <juri.lelli@redhat.com> Cc: stable@vger.kernel.org Fixes: a955d7eac177 ("trace: Add timerlat tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-23accel/qaic: Fix NNC message corruptionJeffrey Hugo
If msg_xfer() is unable to queue part of a NNC message because the MHI ring is full, it will attempt to give the QSM some time to drain the queue. However, if QSM fails to make any room, msg_xfer() will fail and tell the caller to try again. This is problematic because part of the message may have been committed to the ring and there is no mechanism to revoke that content. This will cause QSM to receive a corrupt message. The better way to do this is to check if the ring has enough space for the entire message before committing any of the message. Since msg_xfer() is under the cntl_mutex no one else can come in and consume the space. Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-6-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Grab ch_lock during QAIC_ATTACH_SLICE_BOPranjal Ramajor Asha Kanojiya
During QAIC_ATTACH_SLICE_BO, we associate a BO to its DBC. We need to grab the dbc->ch_lock to make sure that DBC does not goes away while QAIC_ATTACH_SLICE_BO is still running. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-5-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Flush the transfer list againPranjal Ramajor Asha Kanojiya
Before calling synchronize_srcu() we clear the transfer list, this is to allow all the QAIC_WAIT_BO callers to exit otherwise the system could deadlock. There could be a corner case where more elements get added to transfer list after we have flushed it. Re-flush the transfer list once all the holders of dbc->ch_lock have completed execution i.e. synchronize_srcu() is complete. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-4-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Validate if BO is sliced before slicingPranjal Ramajor Asha Kanojiya
QAIC_ATTACH_SLICE_BO attaches slicing configuration to a BO. Validate if given BO is already sliced. An already sliced BO cannot be sliced again. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-3-quic_jhugo@quicinc.com
2023-05-23accel/qaic: Validate user data before grabbing any lockPranjal Ramajor Asha Kanojiya
Validating user data does not need to be protected by any lock and it is safe to move it out of critical region. Fixes: ff13be830333 ("accel/qaic: Add datapath") Fixes: 129776ac2e38 ("accel/qaic: Add control path") Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517193540.14323-2-quic_jhugo@quicinc.com
2023-05-23accel/qaic: initialize ret variable to 0Tom Rix
clang static analysis reports drivers/accel/qaic/qaic_data.c:610:2: warning: Undefined or garbage value returned to caller [core.uninitialized.UndefReturn] return ret; ^~~~~~~~~~ From a code analysis of the function, the ret variable is only set some of the time but is always returned. This suggests ret can return uninitialized garbage. However BO allocation will ensure ret is always set in reality. Initialize ret to 0 to silence the warning. Fixes: ff13be830333 ("accel/qaic: Add datapath") Signed-off-by: Tom Rix <trix@redhat.com> [jhugo: Reword commit text] Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230517165605.16770-1-quic_jhugo@quicinc.com
2023-05-23tracing/user_events: Use long vs int for atomic bit opsBeau Belgrave
Each event stores a int to track which bit to set/clear when enablement changes. On big endian 64-bit configurations, it's possible this could cause memory corruption when it's used for atomic bit operations. Use unsigned long for enablement values to ensure any possible corruption cannot occur. Downcast to int after mask for the bit target. Link: https://lore.kernel.org/all/6f758683-4e5e-41c3-9b05-9efc703e827c@kili.mountain/ Link: https://lore.kernel.org/linux-trace-kernel/20230505205855.6407-1-beaub@linux.microsoft.com Fixes: dcb8177c1395 ("tracing/user_events: Add ioctl for disabling addresses") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-05-23bpf, sockmap: Test progs verifier error with latest clangJohn Fastabend
With a relatively recent clang (7090c10273119) and with this commit to fix warnings in selftests (c8ed668593972) that uses __sink(err) to resolve unused variables. We get the following verifier error. root@6e731a24b33a:/host/tools/testing/selftests/bpf# ./test_sockmap libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockmap_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- libbpf: prog 'bpf_sockmap': failed to load: -13 libbpf: failed to load object 'test_sockhash_kern.bpf.o' load_bpf_file: (-1) No such file or directory ERROR: (-1) load bpf failed libbpf: prog 'bpf_sockmap': BPF program load failed: Permission denied libbpf: prog 'bpf_sockmap': -- BEGIN PROG LOAD LOG -- 0: R1=ctx(off=0,imm=0) R10=fp0 ; op = (int) skops->op; 0: (61) r2 = *(u32 *)(r1 +0) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; switch (op) { 1: (16) if w2 == 0x4 goto pc+5 ; R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 2: (56) if w2 != 0x5 goto pc+15 ; R2_w=5 ; lport = skops->local_port; 3: (61) r2 = *(u32 *)(r1 +68) ; R1=ctx(off=0,imm=0) R2_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) ; if (lport == 10000) { 4: (56) if w2 != 0x2710 goto pc+13 18: R1=ctx(off=0,imm=0) R2=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 ; __sink(err); 18: (bc) w1 = w0 R0 !read_ok processed 18 insns (limit 1000000) max_states_per_insn 0 total_states 2 peak_states 2 mark_read 1 -- END PROG LOAD LOG -- To fix simply remove the err value because its not actually used anywhere in the testing. We can investigate the root cause later. Future patch should probably actually test the err value as well. Although if the map updates fail they will get caught eventually by userspace. Fixes: c8ed668593972 ("selftests/bpf: fix lots of silly mistakes pointed out by compiler") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-15-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with dropsJohn Fastabend
When BPF program drops pkts the sockmap logic 'eats' the packet and updates copied_seq. In the PASS case where the sk_buff is accepted we update copied_seq from recvmsg path so we need a new test to handle the drop case. Original patch series broke this resulting in test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256 After updated patch with fix. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-14-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Test FIONREAD returns correct bytes in rx bufferJohn Fastabend
A bug was reported where ioctl(FIONREAD) returned zero even though the socket with a SK_SKB verdict program attached had bytes in the msg queue. The result is programs may hang or more likely try to recover, but use suboptimal buffer sizes. Add a test to check that ioctl(FIONREAD) returns the correct number of bytes. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-13-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0John Fastabend
When session gracefully shutdowns epoll needs to wake up and any recv() readers should return 0 not the -EAGAIN they previously returned. Note we use epoll instead of select to test the epoll wake on shutdown event as well. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-12-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Build helper to create connected socket pairJohn Fastabend
A common operation for testing is to spin up a pair of sockets that are connected. Then we can use these to run specific tests that need to send data, check BPF programs and so on. The sockmap_listen programs already have this logic lets move it into the new sockmap_helpers header file for general use. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-11-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Pull socket helpers out of listen test for general useJohn Fastabend
No functional change here we merely pull the helpers in sockmap_listen.c into a header file so we can use these in other programs. The tests we are about to add aren't really _listen tests so doesn't make sense to add them here. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-10-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Incorrectly handling copied_seqJohn Fastabend
The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Wake up polling after data copyJohn Fastabend
When TCP stack has data ready to read sk_data_ready() is called. Sockmap overwrites this with its own handler to call into BPF verdict program. But, the original TCP socket had sock_def_readable that would additionally wake up any user space waiters with sk_wake_async(). Sockmap saved the callback when the socket was created so call the saved data ready callback and then we can wake up any epoll() logic waiting on the read. Note we call on 'copied >= 0' to account for returning 0 when a FIN is received because we need to wake up user for this as well so they can do the recvmsg() -> 0 and detect the shutdown. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-8-john.fastabend@gmail.com
2023-05-23bpf, sockmap: TCP data stall on recv before acceptJohn Fastabend
A common mechanism to put a TCP socket into the sockmap is to hook the BPF_SOCK_OPS_{ACTIVE_PASSIVE}_ESTABLISHED_CB event with a BPF program that can map the socket info to the correct BPF verdict parser. When the user adds the socket to the map the psock is created and the new ops are assigned to ensure the verdict program will 'see' the sk_buffs as they arrive. Part of this process hooks the sk_data_ready op with a BPF specific handler to wake up the BPF verdict program when data is ready to read. The logic is simple enough (posted here for easy reading) static void sk_psock_verdict_data_ready(struct sock *sk) { struct socket *sock = sk->sk_socket; if (unlikely(!sock || !sock->ops || !sock->ops->read_skb)) return; sock->ops->read_skb(sk, sk_psock_verdict_recv); } The oversight here is sk->sk_socket is not assigned until the application accepts() the new socket. However, its entirely ok for the peer application to do a connect() followed immediately by sends. The socket on the receiver is sitting on the backlog queue of the listening socket until its accepted and the data is queued up. If the peer never accepts the socket or is slow it will eventually hit data limits and rate limit the session. But, important for BPF sockmap hooks when this data is received TCP stack does the sk_data_ready() call but the read_skb() for this data is never called because sk_socket is missing. The data sits on the sk_receive_queue. Then once the socket is accepted if we never receive more data from the peer there will be no further sk_data_ready calls and all the data is still on the sk_receive_queue(). Then user calls recvmsg after accept() and for TCP sockets in sockmap we use the tcp_bpf_recvmsg_parser() handler. The handler checks for data in the sk_msg ingress queue expecting that the BPF program has already run from the sk_data_ready hook and enqueued the data as needed. So we are stuck. To fix do an unlikely check in recvmsg handler for data on the sk_receive_queue and if it exists wake up data_ready. We have the sock locked in both read_skb and recvmsg so should avoid having multiple runners. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-7-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Handle fin correctlyJohn Fastabend
The sockmap code is returning EAGAIN after a FIN packet is received and no more data is on the receive queue. Correct behavior is to return 0 to the user and the user can then close the socket. The EAGAIN causes many apps to retry which masks the problem. Eventually the socket is evicted from the sockmap because its released from sockmap sock free handling. The issue creates a delay and can cause some errors on application side. To fix this check on sk_msg_recvmsg side if length is zero and FIN flag is set then set return to zero. A selftest will be added to check this condition. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-6-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Improved check for empty queueJohn Fastabend
We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-5-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Reschedule is now done through backlogJohn Fastabend
Now that the backlog manages the reschedule() logic correctly we can drop the partial fix to reschedule from recvmsg hook. Rescheduling on recvmsg hook was added to address a corner case where we still had data in the backlog state but had nothing to kick it and reschedule the backlog worker to run and finish copying data out of the state. This had a couple limitations, first it required user space to kick it introducing an unnecessary EBUSY and retry. Second it only handled the ingress case and egress redirects would still be hung. With the correct fix, pushing the reschedule logic down to where the enomem error occurs we can drop this fix. Fixes: bec217197b412 ("skmsg: Schedule psock work if the cached skb exists on the psock") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-4-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Convert schedule_work into delayed_workJohn Fastabend
Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. >From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-3-john.fastabend@gmail.com
2023-05-23bpf, sockmap: Pass skb ownership through read_skbJohn Fastabend
The read_skb hook calls consume_skb() now, but this means that if the recv_actor program wants to use the skb it needs to inc the ref cnt so that the consume_skb() doesn't kfree the sk_buff. This is problematic because in some error cases under memory pressure we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue(). Then we get this, skb_linearize() __pskb_pull_tail() pskb_expand_head() BUG_ON(skb_shared(skb)) Because we incremented users refcnt from sk_psock_verdict_recv() we hit the bug on with refcnt > 1 and trip it. To fix lets simply pass ownership of the sk_buff through the skb_read call. Then we can drop the consume from read_skb handlers and assume the verdict recv does any required kfree. Bug found while testing in our CI which runs in VMs that hit memory constraints rather regularly. William tested TCP read_skb handlers. [ 106.536188] ------------[ cut here ]------------ [ 106.536197] kernel BUG at net/core/skbuff.c:1693! [ 106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1 [ 106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014 [ 106.537467] RIP: 0010:pskb_expand_head+0x269/0x330 [ 106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202 [ 106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20 [ 106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8 [ 106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000 [ 106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8 [ 106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8 [ 106.540568] FS: 00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 106.540954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0 [ 106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 106.542255] Call Trace: [ 106.542383] <IRQ> [ 106.542487] __pskb_pull_tail+0x4b/0x3e0 [ 106.542681] skb_ensure_writable+0x85/0xa0 [ 106.542882] sk_skb_pull_data+0x18/0x20 [ 106.543084] bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9 [ 106.543536] ? migrate_disable+0x66/0x80 [ 106.543871] sk_psock_verdict_recv+0xe2/0x310 [ 106.544258] ? sk_psock_write_space+0x1f0/0x1f0 [ 106.544561] tcp_read_skb+0x7b/0x120 [ 106.544740] tcp_data_queue+0x904/0xee0 [ 106.544931] tcp_rcv_established+0x212/0x7c0 [ 106.545142] tcp_v4_do_rcv+0x174/0x2a0 [ 106.545326] tcp_v4_rcv+0xe70/0xf60 [ 106.545500] ip_protocol_deliver_rcu+0x48/0x290 [ 106.545744] ip_local_deliver_finish+0xa7/0x150 Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Reported-by: William Findlay <will@isovalent.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: William Findlay <will@isovalent.com> Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
2023-05-23ipv{4,6}/raw: fix output xfrm lookup wrt protocolNicolas Dichtel
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23lan966x: Fix unloading/loading of the driverHoratiu Vultur
It was noticing that after a while when unloading/loading the driver and sending traffic through the switch, it would stop working. It would stop forwarding any traffic and the only way to get out of this was to do a power cycle of the board. The root cause seems to be that the switch core is initialized twice. Apparently initializing twice the switch core disturbs the pointers in the queue systems in the HW, so after a while it would stop sending the traffic. Unfortunetly, it is not possible to use a reset of the switch here, because the reset line is connected to multiple devices like MDIO, SGPIO, FAN, etc. So then all the devices will get reseted when the network driver will be loaded. So the fix is to check if the core is initialized already and if that is the case don't initialize it again. Fixes: db8bcaad5393 ("net: lan966x: add the basic lan966x driver") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522120038.3749026-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-05-23platform/x86/intel/ifs: Annotate work queue on stack so object debug does ↵David Arcari
not complain Object Debug results in the following warning while attempting to load ifs firmware: [ 220.007422] ODEBUG: object 000000003bf952db is on stack 00000000e843994b, but NOT annotated. [ 220.007459] ------------[ cut here ]------------ [ 220.007461] WARNING: CPU: 0 PID: 11774 at lib/debugobjects.c:548 __debug_object_init.cold+0x22e/0x2d5 [ 220.137476] RIP: 0010:__debug_object_init.cold+0x22e/0x2d5 [ 220.254774] Call Trace: [ 220.257641] <TASK> [ 220.265606] scan_chunks_sanity_check+0x368/0x5f0 [intel_ifs] [ 220.288292] ifs_load_firmware+0x2a3/0x400 [intel_ifs] [ 220.332793] current_batch_store+0xea/0x160 [intel_ifs] [ 220.357947] kernfs_fop_write_iter+0x355/0x530 [ 220.363048] new_sync_write+0x28e/0x4a0 [ 220.381226] vfs_write+0x62a/0x920 [ 220.385160] ksys_write+0xf9/0x1d0 [ 220.399421] do_syscall_64+0x59/0x90 [ 220.440635] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 220.566845] ---[ end trace 3a01b299db142b41 ]--- Correct this by calling INIT_WORK_ONSTACK instead of INIT_WORK. Fixes: 684ec215706d ("platform/x86/intel/ifs: Authenticate and copy to secured memory") Signed-off-by: David Arcari <darcari@redhat.com> Cc: Jithu Joseph <jithu.joseph@intel.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mark Gross <markgross@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230523105400.674152-1-darcari@redhat.com Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-23platform/x86: ISST: Remove 8 socket limitSteve Wahl
Stop restricting the PCI search to a range of PCI domains fed to pci_get_domain_bus_and_slot(). Instead, use for_each_pci_dev() and look at all PCI domains in one pass. On systems with more than 8 sockets, this avoids error messages like "Information: Invalid level, Can't get TDP control information at specified levels on cpu 480" from the intel speed select utility. Fixes: aa2ddd242572 ("platform/x86: ISST: Use numa node id for cpu pci dev mapping") Signed-off-by: Steve Wahl <steve.wahl@hpe.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20230519160420.2588475-1-steve.wahl@hpe.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2023-05-23mips: Move initrd_start check after initrd address sanitisation.Liviu Dudau
PAGE_OFFSET is technically a virtual address so when checking the value of initrd_start against it we should make sure that it has been sanitised from the values passed by the bootloader. Without this change, even with a bootloader that passes correct addresses for an initrd, we are failing to load it on MT7621 boards, for example. Signed-off-by: Liviu Dudau <liviu@dudau.co.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-05-23MIPS: Alchemy: fix dbdma2Manuel Lauss
Various fixes for the Au1200/Au1550/Au1300 DBDMA2 code: - skip cache invalidation if chip has working coherency circuitry. - invalidate KSEG0-portion of the (physical) data address. - force the dma channel doorbell write out to bus immediately with a sync. Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-05-23MIPS: Restore Au1300 supportManuel Lauss
The Au1300, at least the one I have to test, uses the NetLogic vendor ID, but commit 95b8a5e0111a ("MIPS: Remove NETLOGIC support") also dropped Au1300 detection. Restore Au1300 detection. Tested on DB1300 with Au1380 chip. Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-05-23MIPS: unhide PATA_PLATFORMManuel Lauss
Alchemy DB1200/DB1300 boards can use the pata_platform driver. Unhide the config entry in all of MIPS. Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2023-05-23erofs: use HIPRI by default if per-cpu kthreads are enabledGao Xiang
As Sandeep shown [1], high priority RT per-cpu kthreads are typically helpful for Android scenarios to minimize the scheduling latencies. Switch EROFS_FS_PCPU_KTHREAD_HIPRI on by default if EROFS_FS_PCPU_KTHREAD is on since it's the typical use cases for EROFS_FS_PCPU_KTHREAD. Also clean up unneeded sched_set_normal(). [1] https://lore.kernel.org/r/CAB=BE-SBtO6vcoyLNA9F-9VaN5R0t3o_Zn+FW8GbO6wyUqFneQ@mail.gmail.com Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20230522092141.124290-1-hsiangkao@linux.alibaba.com
2023-05-23erofs: avoid pcpubuf.c inclusion if CONFIG_EROFS_FS_ZIP is offYue Hu
The function of pcpubuf.c is just for low-latency decompression algorithms (e.g. lz4). Signed-off-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515095758.10391-1-zbestahu@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-23erofs: fix null-ptr-deref caused by erofs_xattr_prefixes_initJingbo Xu
Fragments and dedupe share one feature bit, and thus packed inode may not exist when fragment feature bit (dedupe feature bit exactly) is set, e.g. when deduplication feature is in use while fragments feature is not. In this case, sbi->packed_inode could be NULL while fragments feature bit is set. Fix this by accessing packed inode only when it exists. Reported-by: syzbot+902d5a9373ae8f748a94@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=902d5a9373ae8f748a94 Reported-and-tested-by: syzbot+bbb353775d51424087f2@syzkaller.appspotmail.com Fixes: 9e382914617c ("erofs: add helpers to load long xattr name prefixes") Fixes: 6a318ccd7e08 ("erofs: enable long extended attribute name prefixes") Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yue Hu <huyue2@coolpad.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20230515103941.129784-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-05-23gpio-f7188x: fix chip name and pin count on Nuvoton chipHenning Schild
In fact the device with chip id 0xD283 is called NCT6126D, and that is the chip id the Nuvoton code was written for. Correct that name to avoid confusion, because a NCT6116D in fact exists as well but has another chip id, and is currently not supported. The look at the spec also revealed that GPIO group7 in fact has 8 pins, so correct the pin count in that group as well. Fixes: d0918a84aff0 ("gpio-f7188x: Add GPIO support for Nuvoton NCT6116") Reported-by: Xing Tong Wu <xingtong.wu@siemens.com> Signed-off-by: Henning Schild <henning.schild@siemens.com> Acked-by: Simon Guinot <simon.guinot@sequanux.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2023-05-23perf/x86/intel: Save/restore cpuc->active_pebs_data_cfg when using guest PEBSLike Xu
After commit b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG"), the cpuc->pebs_data_cfg may save some bits that are not supported by real hardware, such as PEBS_UPDATE_DS_SW. This would cause the VMX hardware MSR switching mechanism to save/restore invalid values for PEBS_DATA_CFG MSR, thus crashing the host when PEBS is used for guest. Fix it by using the active host value from cpuc->active_pebs_data_cfg. Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Signed-off-by: Like Xu <likexu@tencent.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20230517133808.67885-1-likexu@tencent.com
2023-05-22net/mlx5: Fix indexing of mlx5_irqShay Drory
After the cited patch, mlx5_irq xarray index can be different then mlx5_irq MSIX table index. Fix it by storing both mlx5_irq xarray index and MSIX table index. Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22net/mlx5: Fix irq affinity managementShay Drory
The cited patch deny the user of changing the affinity of mlx5 irqs, which break backward compatibility. Hence, allow the user to change the affinity of mlx5 irqs. Fixes: bbac70c74183 ("net/mlx5: Use newer affinity descriptor") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>