summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-11RISC-V: kexec: Fixup use of smp_processor_id() in preemptible contextXianting Tian
Use __smp_processor_id() to avoid check the preemption context when CONFIG_DEBUG_PREEMPT enabled, as we will enter crash kernel and no return. Without the patch, [ 103.781044] sysrq: Trigger a crash [ 103.784625] Kernel panic - not syncing: sysrq triggered crash [ 103.837634] CPU1: off [ 103.889668] CPU2: off [ 103.933479] CPU3: off [ 103.939424] Starting crashdump kernel... [ 103.943442] BUG: using smp_processor_id() in preemptible [00000000] code: sh/346 [ 103.950884] caller is debug_smp_processor_id+0x1c/0x26 [ 103.956051] CPU: 0 PID: 346 Comm: sh Kdump: loaded Not tainted 5.10.113-00002-gce03f03bf4ec-dirty #149 [ 103.965355] Call Trace: [ 103.967805] [<ffffffe00020372a>] walk_stackframe+0x0/0xa2 [ 103.973206] [<ffffffe000bcf1f4>] show_stack+0x32/0x3e [ 103.978258] [<ffffffe000bd382a>] dump_stack_lvl+0x72/0x8e [ 103.983655] [<ffffffe000bd385a>] dump_stack+0x14/0x1c [ 103.988705] [<ffffffe000bdc8fe>] check_preemption_disabled+0x9e/0xaa [ 103.995057] [<ffffffe000bdc926>] debug_smp_processor_id+0x1c/0x26 [ 104.001150] [<ffffffe000206c64>] machine_kexec+0x22/0xd0 [ 104.006463] [<ffffffe000291a7e>] __crash_kexec+0x6a/0xa4 [ 104.011774] [<ffffffe000bcf3fa>] panic+0xfc/0x2b0 [ 104.016480] [<ffffffe000656ca4>] sysrq_reset_seq_param_set+0x0/0x70 [ 104.022745] [<ffffffe000657310>] __handle_sysrq+0x8c/0x154 [ 104.028229] [<ffffffe0006577e8>] write_sysrq_trigger+0x5a/0x6a [ 104.034061] [<ffffffe0003d90e0>] proc_reg_write+0x58/0xd4 [ 104.039459] [<ffffffe00036cff4>] vfs_write+0x7e/0x254 [ 104.044509] [<ffffffe00036d2f6>] ksys_write+0x58/0xbe [ 104.049558] [<ffffffe00036d36a>] sys_write+0xe/0x16 [ 104.054434] [<ffffffe000201b9a>] ret_from_syscall+0x0/0x2 [ 104.067863] Will call new kernel at ecc00000 from hart id 0 [ 104.074939] FDT image at fc5ee000 [ 104.079523] Bye... With the patch we can got clear output, [ 67.740553] sysrq: Trigger a crash [ 67.744166] Kernel panic - not syncing: sysrq triggered crash [ 67.809123] CPU1: off [ 67.865210] CPU2: off [ 67.909075] CPU3: off [ 67.919123] Starting crashdump kernel... [ 67.924900] Will call new kernel at ecc00000 from hart id 0 [ 67.932045] FDT image at fc5ee000 [ 67.935560] Bye... Fixes: 0e105f1d0037 ("riscv: use hart id instead of cpu id on machine_kexec") Reviewed-by: Guo Ren <guoren@kernel.org> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com> Link: https://lore.kernel.org/r/20220811074150.3020189-2-xianting.tian@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-08-11bonding: fix reference count leak in balance-alb modeJay Vosburgh
Commit d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") introduced a reference count leak by not releasing the reference acquired by ip_dev_find(). Remedy this by insuring the reference is released. Fixes: d5410ac7b0ba ("net:bonding:support balance-alb interface with vlan to bridge") Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/26758.1660194413@famine Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11Revert "Makefile.extrawarn: re-enable -Wformat for clang"Linus Torvalds
This reverts commit 258fafcd0683d9ccfa524129d489948ab3ddc24c. The clang -Wformat warning is terminally broken, and the clang people can't seem to get their act together. This test program causes a warning with clang: #include <stdio.h> int main(int argc, char **argv) { printf("%hhu\n", 'a'); } resulting in t.c:5:19: warning: format specifies type 'unsigned char' but the argument has type 'int' [-Wformat] printf("%hhu\n", 'a'); ~~~~ ^~~ %d and apparently clang people consider that a feature, because they don't want to face the reality of how either C character constants, C arithmetic, and C varargs functions work. The rest of the world just shakes their head at that kind of incompetence, and turns off -Wformat for clang again. And no, the "you should use a pointless cast to shut this up" is not a valid answer. That warning should not exist in the first place, or at least be optinal with some "-Wformat-me-harder" kind of option. [ Admittedly, there's also very little reason to *ever* use '%hh[ud]' in C, but what little reason there is is entirely about 'I want to see only the low 8 bits of the argument'. So I would suggest nobody ever use that format in the first place, but if they do, the clang behavious is simply always wrong. Because '%hhu' takes an 'int'. It's that simple. ] Reported-by: Sudip Mukherjee (Codethink) <sudipm.mukherjee@gmail.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-08-11cifs: Move cached-dir functions into a separate fileRonnie Sahlberg
Also rename crfid to cfid to have consistent naming for this variable. This commit does not change any logic. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-08-11dm bufio: fix some cases where the code sleeps with spinlock heldMikulas Patocka
Commit b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag") added a "NO_SLEEP" mode, it replaces a mutex with a spinlock, and it is only usable when the device is in read-only mode (because the write path may be sleeping while holding the dm_bufio_client lock). However, there are still two points where the code could sleep even in read-only mode. One is in __get_unclaimed_buffer -> __make_buffer_clean. The other is in __try_evict_buffer -> __make_buffer_clean. These functions will call __make_buffer_clean which sleeps if the buffer is being read. Fix these cases so that if c->no_sleep is set __make_buffer_clean will not be called and the buffer will be skipped instead. Fixes: b32d45824aa7 ("dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2022-08-11arch/riscv: add Zihintpause supportDao Lu
Implement support for the ZiHintPause extension. The PAUSE instruction is a HINT that indicates the current hart’s rate of instruction retirement should be temporarily reduced or paused. Reviewed-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Dao Lu <daolu@rivosinc.com> [Palmer: Some minor merge conflicts.] Link: https://lore.kernel.org/all/20220620201530.3929352-1-daolu@rivosinc.com/ Link: https://lore.kernel.org/all/20220811053356.17375-1-palmer@rivosinc.com/ Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-08-11net: usb: qmi_wwan: Add support for Cinterion MV32Slark Xiao
There are 2 models for MV32 serials. MV32-W-A is designed based on Qualcomm SDX62 chip, and MV32-W-B is designed based on Qualcomm SDX65 chip. So we use 2 different PID to separate it. Test evidence as below: T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1e2d ProdID=00f3 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F3 USB Mobile Broadband S: SerialNumber=d7b4be8d C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option T: Bus=03 Lev=01 Prnt=01 Port=02 Cnt=03 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1e2d ProdID=00f4 Rev=05.04 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00F4 USB Mobile Broadband S: SerialNumber=d095087d C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=50 Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option Signed-off-by: Slark Xiao <slark_xiao@163.com> Acked-by: Bjørn Mork <bjorn@mork.no> Link: https://lore.kernel.org/r/20220810014521.9383-1-slark_xiao@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Pull in bpf to silence a false positive warning. * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Shut up kern_sys_bpf warning. Link: https://lore.kernel.org/r/CAADnVQK589CZN1Q9w8huJqkEyEed+ZMTWqcpA1Rm2CjN3a4XoQ@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11Merge tag 'nvme-6.0-2022-08-11' of git://git.infradead.org/nvme into block-6.0Jens Axboe
Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.0 - print nvme connect Linux error codes properly (Amit Engel) - fix the fc_appid_store return value (Christoph Hellwig) - fix a typo in an error message (Christophe JAILLET) - add another non-unique identifier quirk (Dennis P. Kliem) - check if the queue is allocated before stopping it in nvme-tcp (Maurizio Lombardi) - restart admin queue if the caller needs to restart queue in nvme-fc (Ming Lei) - use kmemdup instead of kmalloc + memcpy in nvme-auth (Zhang Xiaoxu)" * tag 'nvme-6.0-2022-08-11' of git://git.infradead.org/nvme: nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70 nvme-tcp: check if the queue is allocated before stopping it nvme-fabrics: Fix a typo in an error message nvme-fabrics: parse nvme connect Linux error codes nvmet-auth: use kmemdup instead of kmalloc + memcpy nvme-fc: fix the fc_appid_store return value nvme-fc: restart admin queue if the caller needs to restart queue
2022-08-11vdpa/mlx5: Fix possible uninitialized return valueEli Cohen
Initialize err local variable to return -EAGAIN if the asid cannot be found thus avoiding returning uninitialized value. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220811134010.952291-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11Merge 'irq/loongarch', 'pci/ctrl/loongson' and 'pci/header-cleanup-immutable'Huacai Chen
LoongArch architecture changes for 5.20 depend on the irqchip and pci changes to work, so merge them to create a base.
2022-08-11i2c: microchip-corei2c: fix erroneous late ack sendConor Dooley
A late ack is currently being sent at the end of a transfer due to incorrect logic in mchp_corei2c_empty_rx(). Currently the Assert Ack bit is being written to the controller's control reg after the last byte has been received, causing it to sent another byte with the ack. Instead, the AA flag should be written to the control register when the penultimate byte is read so it is sent out for the last byte. Reported-by: Andreas Buerkler <andreas.buerkler@enclustra.com> Fixes: 64a6f1c4987e ("i2c: add support for microchip fpga i2c controllers") Tested-by: Lewis Hanly <lewis.hanly@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com> [wsa: fixed typos in commit message] Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-11dt-bindings: i2c: qcom,i2c-cci: convert to dtschemaKrzysztof Kozlowski
Convert the Qualcomm Camera Control Interface (CCI) I2C controller to DT schema. The original bindings were not complete, so this includes changes: 1. Add address/size-cells. 2. Describe the clocks per variant. 3. Use more descriptive example based on sdm845. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-11i2c: qcom-geni: Fix GPI DMA buffer sync-backRobin Reckmann
Fix i2c transfers using GPI DMA mode for all message types that do not set the I2C_M_DMA_SAFE flag (e.g. SMBus "read byte"). In this case a bounce buffer is returned by i2c_get_dma_safe_msg_buf(), and it has to synced back to the message after the transfer is done. Add missing assignment of dma buffer in geni_i2c_gpi(). Set xferred in i2c_put_dma_safe_msg_buf() to true in case of no error to ensure the sync-back of this dma buffer to the message. Fixes: d8703554f4de ("i2c: qcom-geni: Add support for GPI DMA") Signed-off-by: Robin Reckmann <robin.reckmann@gmail.com> Tested-by: Luca Weiss <luca.weiss@fairphone.com> Tested-by: Caleb Connolly <caleb@connolly.tech> Reviewed-by: Konrad Dybcio <konrad.dybcio@somainline.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2022-08-11nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG GAMMIX S70Dennis P. Kliem
ADATA XPG GAMMIX S70 reports bogus eui64 values that appear to be the same across all drives. Quirk them out so they are not marked as "non globally unique" duplicates. Signed-off-by: Dennis P. Kliem <dpkliem@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-08-11vdpa_sim_blk: add support for discard and write-zeroesStefano Garzarella
Expose VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES features to the drivers and handle VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES requests checking ranges and flags. The simulator behaves like a ramdisk, so for VIRTIO_BLK_F_DISCARD does nothing, while for VIRTIO_BLK_T_WRITE_ZEROES sets to 0 the specified region. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220811083632.77525-5-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSHStefano Garzarella
The simulator behaves like a ramdisk, so we don't have to do anything when a VIRTIO_BLK_T_FLUSH request is received, but it could be useful to test driver behavior. Let's expose the VIRTIO_BLK_F_FLUSH feature to inform the driver that we support the flush command. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220811083632.77525-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa_sim_blk: make vdpasim_blk_check_range usable by other requestsStefano Garzarella
Next patches will add handling of other requests, where will be useful to reuse vdpasim_blk_check_range(). So let's make it more generic by adding the `max_sectors` parameter, since different requests allow different numbers of maximum sectors. Let's also print the messages directly in vdpasim_blk_check_range() to avoid duplicate prints. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220811083632.77525-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa_sim_blk: check if sector is 0 for commands other than read or writeStefano Garzarella
VIRTIO spec states: "The sector number indicates the offset (multiplied by 512) where the read or write is to occur. This field is unused and set to 0 for commands other than read or write." Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220811083632.77525-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa_sim: Implement suspend vdpa opEugenio Pérez
Implement suspend operation for vdpa_sim devices, so vhost-vdpa will offer that backend feature and userspace can effectively suspend the device. This is a must before get virtqueue indexes (base) for live migration, since the device could modify them after userland gets them. There are individual ways to perform that action for some devices (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no way to perform it for any vhost device (and, in particular, vhost-vdpa). Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-5-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: uAPI to suspend the deviceEugenio Pérez
The ioctl adds support for suspending the device from userspace. This is a must before getting virtqueue indexes (base) for live migration, since the device could modify them after userland gets them. There are individual ways to perform that action for some devices (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no way to perform it for any vhost device (and, in particular, vhost-vdpa). After a successful return of the ioctl call the device must not process more virtqueue descriptors. The device can answer to read or writes of config fields as if it were not suspended. In particular, writing to "queue_enable" with a value of 1 will not make the device start processing buffers of the virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: introduce SUSPEND backend feature bitEugenio Pérez
Userland knows if it can suspend the device or not by checking this feature bit. It's only offered if the vdpa driver backend implements the suspend() operation callback, and to offer it or userland to ack it if the backend does not offer that callback is an error. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa: Add suspend operationEugenio Pérez
This operation is optional: It it's not implemented, backend feature bit will not be exposed. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-2-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio-blk: Avoid use-after-free on suspend/resumeShigeru Yoshida
hctx->user_data is set to vq in virtblk_init_hctx(). However, vq is freed on suspend and reallocated on resume. So, hctx->user_data is invalid after resume, and it will cause use-after-free accessing which will result in the kernel crash something like below: [ 22.428391] Call Trace: [ 22.428899] <TASK> [ 22.429339] virtqueue_add_split+0x3eb/0x620 [ 22.430035] ? __blk_mq_alloc_requests+0x17f/0x2d0 [ 22.430789] ? kvm_clock_get_cycles+0x14/0x30 [ 22.431496] virtqueue_add_sgs+0xad/0xd0 [ 22.432108] virtblk_add_req+0xe8/0x150 [ 22.432692] virtio_queue_rqs+0xeb/0x210 [ 22.433330] blk_mq_flush_plug_list+0x1b8/0x280 [ 22.434059] __blk_flush_plug+0xe1/0x140 [ 22.434853] blk_finish_plug+0x20/0x40 [ 22.435512] read_pages+0x20a/0x2e0 [ 22.436063] ? folio_add_lru+0x62/0xa0 [ 22.436652] page_cache_ra_unbounded+0x112/0x160 [ 22.437365] filemap_get_pages+0xe1/0x5b0 [ 22.437964] ? context_to_sid+0x70/0x100 [ 22.438580] ? sidtab_context_to_sid+0x32/0x400 [ 22.439979] filemap_read+0xcd/0x3d0 [ 22.440917] xfs_file_buffered_read+0x4a/0xc0 [ 22.441984] xfs_file_read_iter+0x65/0xd0 [ 22.442970] __kernel_read+0x160/0x2e0 [ 22.443921] bprm_execve+0x21b/0x640 [ 22.444809] do_execveat_common.isra.0+0x1a8/0x220 [ 22.446008] __x64_sys_execve+0x2d/0x40 [ 22.446920] do_syscall_64+0x37/0x90 [ 22.447773] entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes this issue by getting vq from vblk, and removes virtblk_init_hctx(). Fixes: 4e0400525691 ("virtio-blk: support polling I/O") Cc: "Suwan Kim" <suwan.kim027@gmail.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Message-Id: <20220810160948.959781-1-syoshida@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio_vdpa: support the arg sizes of find_vqs()Bo Liu
Virtio vdpa support the new parameter sizes of find_vqs(). Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220810085151.7251-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: Call ida_simple_remove() when failedBo Liu
In function vhost_vdpa_probe(), when code execution fails, we should call ida_simple_remove() to free ida. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220805091254.20026-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vDPA: fix 'cast to restricted le16' warnings in vdpa.cZhu Lingshan
This commit fixes spars warnings: cast to restricted __le16 in function vdpa_dev_net_config_fill() and vdpa_fill_stats_rec() Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Message-Id: <20220722115309.82746-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vDPA: !FEATURES_OK should not block querying device config spaceZhu Lingshan
Users may want to query the config space of a vDPA device, to choose a appropriate one for a certain guest. This means the users need to read the config space before FEATURES_OK, and the existence of config space contents does not depend on FEATURES_OK. The spec says: The device MUST allow reading of any device-specific configuration field before FEATURES_OK is set by the driver. This includes fields which are conditional on feature bits, as long as those feature bits are offered by the device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220722115309.82746-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vDPA/ifcvf: support userspace to query features and MQ of a management deviceZhu Lingshan
Adapting to current netlink interfaces, this commit allows userspace to query feature bits and MQ capability of a management device. Currently both the vDPA device and the management device are the VF itself, thus this ifcvf should initialize the virtio capabilities in probe() before setting up the struct vdpa_mgmt_dev. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220722115309.82746-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vDPA/ifcvf: get_config_size should return a value no greater than dev ↵Zhu Lingshan
implementation Drivers must not access a BAR outside the capability length, and for a virtio device, ifcvf driver should not report any non-standard capability contents to the upper layers. Function ifcvf_get_config_size() is introduced here to return a safe value of the device config capability size. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20220722115309.82746-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost scsi: Allow user to control num virtqueuesMike Christie
We are currently hard coded to always create 128 IO virtqueues, so this adds a modparam to control it. For large systems where we are ok with using memory for virtqueues it allows us to add up to 1024. This limit was just selected becuase that's qemu's limit. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20220708030525.5065-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vhost-scsi: Fix max number of virtqueuesMike Christie
Qemu takes it's num_queues limit then adds the fixed queues (control and event) to the total it will request from the kernel. So when a user requests 128 (or qemu does it's num_queues calculation based on vCPUS and other system limits), we hit errors due to userspace trying to setup 130 queues when vhost-scsi has a hard coded limit of 128. This has vhost-scsi adjust it's max so we can do a total of 130 virtqueues (128 IO and 2 fixed). For the case where the user has 128 vCPUs the guest OS can then nicely map each IO virtqueue to a vCPU and not have the odd case where 2 vCPUs share a virtqueue. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20220708030525.5065-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vdpa/mlx5: Support different address spaces for control and dataEli Cohen
Partition virtqueues to two different address spaces: one for control virtqueue which is implemented in software, and one for data virtqueues. Based-on: <20220526124338.36247-1-eperezma@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220714113927.85729-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa/mlx5: Implement susupend virtqueue callbackEli Cohen
Implement the suspend callback allowing to suspend the virtqueues so they stop processing descriptors. This is required to allow to query a consistent state of the virtqueue while live migration is taking place. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220714113927.85729-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vduse: Support querying information of IOVA regionsXie Yongji
This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to support querying some information of IOVA regions. Now it can be used to query whether the IOVA region supports userspace memory registration. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220803045523.23851-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vduse: Support registering userspace memory for IOVA regionsXie Yongji
Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support registering and de-registering userspace memory for IOVA regions. Now it only supports registering userspace memory for bounce buffer region in virtio-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220803045523.23851-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vduse: Support using userspace pages as bounce bufferXie Yongji
Introduce two APIs: vduse_domain_add_user_bounce_pages() and vduse_domain_remove_user_bounce_pages() to support adding and removing userspace pages for bounce buffers. During adding and removing, the DMA data would be copied from the kernel bounce pages to the userspace bounce pages and back. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220803045523.23851-4-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vduse: Use memcpy_{to,from}_page() in do_bounce()Xie Yongji
kmap_atomic() is being deprecated in favor of kmap_local_page(). The use of kmap_atomic() in do_bounce() is all thread local therefore kmap_local_page() is a sufficient replacement. Convert to kmap_local_page() but, instead of open coding it, use the helpers memcpy_to_page() and memcpy_from_page(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Message-Id: <20220803045523.23851-3-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vduse: Remove unnecessary spin lock protectionXie Yongji
Now we use domain->iotlb_lock to protect two different variables: domain->bounce_maps->bounce_page and domain->iotlb. But for domain->bounce_maps->bounce_page, we actually don't need any synchronization between vduse_domain_get_bounce_page() and vduse_domain_free_bounce_pages() since vduse_domain_get_bounce_page() will only be called in page fault handler and vduse_domain_free_bounce_pages() will be called during file release. So let's remove the unnecessary spin lock protection in vduse_domain_get_bounce_page(). Then the usage of domain->iotlb_lock could be more clear: the lock will be only used to protect the domain->iotlb. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220803045523.23851-2-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11net: virtio_net: notifications coalescing supportAlvaro Karsz
New VirtIO network feature: VIRTIO_NET_F_NOTF_COAL. Control a Virtio network device notifications coalescing parameters using the control virtqueue. A device that supports this fetature can receive VIRTIO_NET_CTRL_NOTF_COAL control commands. - VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: Ask the network device to change the following parameters: - tx_usecs: Maximum number of usecs to delay a TX notification. - tx_max_packets: Maximum number of packets to send before a TX notification. - VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: Ask the network device to change the following parameters: - rx_usecs: Maximum number of usecs to delay a RX notification. - rx_max_packets: Maximum number of packets to receive before a RX notification. VirtIO spec. patch: https://lists.oasis-open.org/archives/virtio-comment/202206/msg00100.html Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220718091102.498774-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11virtio: Check dev_set_name() return valueBo Liu
It's possible that dev_set_name() returns -ENOMEM, catch and handle this. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20220707031751.4802-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11tools/virtio: fix buildStefano Garzarella
Fix the build caused by the following changes: - phys_addr_t is now defined in tools/include/linux/types.h - dev_warn_once() is used in drivers/virtio/virtio_ring.c - linux/uio.h included by vringh.h use INT_MAX defined in limits.h Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220705072249.7867-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vDPA/ifcvf: remove duplicated assignment to pointer cfgColin Ian King
The assignment to pointer cfg is duplicated, the second assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20220704190456.593464-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vdpa: ifcvf: Fix spelling mistake in commentsZhang Jiaming
There is a typo(does't) in comments. It maybe 'doesn't' instead of 'does't'. Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Message-Id: <20220704024104.15535-1-jiaming@nfschina.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vdpa/mlx5: Use eth_broadcast_addr() to assign broadcast addressXu Qiang
Using eth_broadcast_addr() to assign broadcast address instead of memset(). Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Xu Qiang <xuqiang36@huawei.com> Message-Id: <20220704021405.64545-1-xuqiang36@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vdpa_sim: use max_iotlb_entries as a limit in vhost_iotlb_initStefano Garzarella
Commit bda324fd037a ("vdpasim: control virtqueue support") changed the allocation of iotlbs calling vhost_iotlb_init() for each address space, instead of vhost_iotlb_alloc(). With this change we forgot to use the limit we had introduced with the `max_iotlb_entries` module parameter. Fixes: bda324fd037a ("vdpasim: control virtqueue support") Cc: gautam.dawar@xilinx.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220621151208.189959-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-08-11vdpa_sim_blk: set number of address spaces and virtqueue groupsStefano Garzarella
Commit bda324fd037a ("vdpasim: control virtqueue support") added two new fields (nas, ngroups) to vdpasim_dev_attr, but we forgot to initialize them for vdpa_sim_blk. When creating a new vdpa_sim_blk device this causes the kernel to panic in this way:    $ vdpa dev add mgmtdev vdpasim_blk name blk0    BUG: kernel NULL pointer dereference, address: 0000000000000030    ...    RIP: 0010:vhost_iotlb_add_range_ctx+0x41/0x220 [vhost_iotlb]    ...    Call Trace:     <TASK>     vhost_iotlb_add_range+0x11/0x800 [vhost_iotlb]     vdpasim_map_range+0x91/0xd0 [vdpa_sim]     vdpasim_alloc_coherent+0x56/0x90 [vdpa_sim]     ... This happens because vdpasim->iommu[0] is not initialized when dev_attr.nas is 0. Let's fix this issue by initializing both (nas, ngroups) to 1 for vdpa_sim_blk. Fixes: bda324fd037a ("vdpasim: control virtqueue support") Cc: gautam.dawar@xilinx.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220621151323.190431-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2022-08-11vdpa_sim_blk: call vringh_complete_iotlb() also in the error pathStefano Garzarella
Call vringh_complete_iotlb() even when we encounter a serious error that prevents us from writing the status in the "in" header (e.g. the header length is incorrect, etc.). The guest is misbehaving, so maybe the ring is in a bad state, but let's avoid making things worse. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220630153221.83371-4-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vdpa_sim_blk: limit the number of request handled per batchStefano Garzarella
Limit the number of requests (4 per queue as for vdpa_sim_net) handled in a batch to prevent the worker from using the CPU for too long. Suggested-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220630153221.83371-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vdpa_sim_blk: use dev_dbg() to print errorsStefano Garzarella
Use dev_dbg() instead of dev_err()/dev_warn() to avoid flooding the host with prints, when the guest driver is misbehaving. In this way, prints can be dynamically enabled when the vDPA block simulator is used to validate a driver. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220630153221.83371-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>