summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-12-18vdpa_sim: add work_fn in vdpasim_dev_attrStefano Garzarella
Rename vdpasim_work() in vdpasim_net_work() and add it to the vdpasim_dev_attr structure. Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-10-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: add supported_features field in vdpasim_dev_attrStefano Garzarella
Introduce a new VDPASIM_FEATURES macro with the generic features supported by the vDPA simulator, and VDPASIM_NET_FEATURES macro with vDPA-net features. Add 'supported_features' field in vdpasim_dev_attr, to allow devices to specify their features. Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-9-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: add device id field in vdpasim_dev_attrStefano Garzarella
Remove VDPASIM_DEVICE_ID macro and add 'id' field in vdpasim_dev_attr, that will be returned by vdpasim_get_device_id(). Use VIRTIO_ID_NET for vDPA-net simulator device id. Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-8-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: add struct vdpasim_dev_attr for device attributesStefano Garzarella
vdpasim_dev_attr will contain device specific attributes. We starting moving the number of virtqueues (i.e. nvqs) to vdpasim_dev_attr. vdpasim_create() creates a new vDPA simulator following the device attributes defined in the vdpasim_dev_attr parameter. Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-7-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: rename vdpasim_config_ops variablesStefano Garzarella
These variables store generic callbacks used by the vDPA simulator core, so we can remove the 'net' word in their names. Co-developed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-6-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: make IOTLB entries limit configurableStefano Garzarella
Some devices may require a higher limit for the number of IOTLB entries, so let's make it configurable through a module parameter. By default, it's initialized with the current limit (2048). Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-5-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: remove hard-coded virtq countMax Gurtovoy
Add a new attribute that will define the number of virt queues to be created for the vdpasim device. Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> [sgarzare: replace kmalloc_array() with kcalloc()] Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-4-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa_sim: remove unnecessary headers inclusionStefano Garzarella
Some headers are not necessary, so let's remove them to do some cleaning. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-3-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa: remove unnecessary 'default n' in Kconfig entriesStefano Garzarella
'default n' is not necessary since it is already the default when nothing is specified. Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201215144256.155342-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18vdpa: ifcvf: Use dma_set_mask_and_coherent to simplify codeChristophe JAILLET
'pci_set_dma_mask()' + 'pci_set_consistent_dma_mask()' can be replaced by an equivalent 'dma_set_mask_and_coherent()' which is much less verbose. While at it, fix a typo (s/confiugration/configuration) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/20201129125434.1462638-1-christophe.jaillet@wanadoo.fr Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2020-12-18vhost_vdpa: switch to vmemdup_user()Tian Tao
Replace opencoded alloc and copy with vmemdup_user() Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1605057288-60400-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2020-12-18virtio-mem: Big Block Mode (BBM) - safe memory hotunplugDavid Hildenbrand
Let's add a safe mechanism to unplug memory, avoiding long/endless loops when trying to offline memory - similar to in SBM. Fake-offline all memory (via alloc_contig_range()) before trying to offline+remove it. Use this mode as default, but allow to enable the other mode explicitly (which could give better memory hotunplug guarantees in some environments). The "unsafe" mode can be enabled e.g., via virtio_mem.bbm_safe_unplug=0 on the cmdline. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-30-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: Big Block Mode (BBM) - basic memory hotunplugDavid Hildenbrand
Let's try to unplug completely offline big blocks first. Then, (if enabled via unplug_offline) try to offline and remove whole big blocks. No locking necessary - we can deal with concurrent onlining/offlining just fine. Note1: This is sub-optimal and might be dangerous in some environments: we could end up in an infinite loop when offlining (e.g., long-term pinnings), similar as with DIMMs. We'll introduce safe memory hotunplug via fake-offlining next, and use this basic mode only when explicitly enabled. Note2: Without ZONE_MOVABLE, memory unplug will be extremely unreliable with bigger block sizes. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-29-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18mm/memory_hotplug: extend offline_and_remove_memory() to handle more than ↵David Hildenbrand
one memory block virtio-mem soon wants to use offline_and_remove_memory() memory that exceeds a single Linux memory block (memory_block_size_bytes()). Let's remove that restriction. Let's remember the old state and try to restore that if anything goes wrong. While re-onlining can, in general, fail, it's highly unlikely to happen (usually only when a notifier fails to allocate memory, and these are rather rare). This will be used by virtio-mem to offline+remove memory ranges that are bigger than a single memory block - for example, with a device block size of 1 GiB (e.g., gigantic pages in the hypervisor) and a Linux memory block size of 128MB. While we could compress the state into 2 bit, using 8 bit is much easier. This handling is similar, but different to acpi_scan_try_to_offline(): a) We don't try to offline twice. I am not sure if this CONFIG_MEMCG optimization is still relevant - it should only apply to ZONE_NORMAL (where we have no guarantees). If relevant, we can always add it. b) acpi_scan_try_to_offline() simply onlines all memory in case something goes wrong. It doesn't restore previous online type. Let's do that, so we won't overwrite what e.g., user space configured. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-28-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
2020-12-18virtio-mem: allow to force Big Block Mode (BBM) and set the big block sizeDavid Hildenbrand
Let's allow to force BBM, even if subblocks would be possible. Take care of properly calculating the first big block id, because the start address might no longer be aligned to the big block size. Also, allow to manually configure the size of Big Blocks. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-27-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: Big Block Mode (BBM) memory hotplugDavid Hildenbrand
Currently, we do not support device block sizes that exceed the Linux memory block size. For example, having a device block size of 1 GiB (e.g., gigantic pages in the hypervisor) won't work with 128 MiB Linux memory blocks. Let's implement Big Block Mode (BBM), whereby we add/remove at least one Linux memory block at a time. With a 1 GiB device block size, a Big Block (BB) will cover 8 Linux memory blocks. We'll keep registering the online_page_callback machinery, it will be used for safe memory hotunplug in BBM next. Note: BBM is properly prepared for variable-sized Linux memory blocks that we might see in the future. So we won't care how many Linux memory blocks a big block actually spans, and how the memory notifier is called. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-26-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: factor out adding/removing memory from LinuxDavid Hildenbrand
Let's use wrappers for the low-level functions that dev_dbg/dev_warn and work on addr + size, such that we can reuse them for adding/removing in other granularity. We only warn when adding memory failed, because that's something to pay attention to. We won't warn when removing failed, we'll reuse that in racy context soon (and we do have proper BUG_ON() statements in the current cases where it must never happen). Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-25-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: memory notifier callbacks are specific to Sub Block Mode (SBM)David Hildenbrand
Let's rename accordingly. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-24-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virito-mem: existing (un)plug functions are specific to Sub Block Mode (SBM)David Hildenbrand
Let's rename them accordingly. virtio_mem_plug_request() and virtio_mem_unplug_request() will be handled separately. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-23-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: memory block ids are specific to Sub Block Mode (SBM)David Hildenbrand
Let's move first_mb_id/next_mb_id/last_usable_mb_id accordingly. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-22-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: nb_sb_per_mb and subblock_size are specific to Sub Block Mode (SBM)David Hildenbrand
Let's rename to "sbs_per_mb" and "sb_size" and move accordingly. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-21-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virito-mem: subblock states are specific to Sub Block Mode (SBM)David Hildenbrand
Let's rename and move accordingly. While at it, rename sb_bitmap to "sb_states". Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-20-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: memory block states are specific to Sub Block Mode (SBM)David Hildenbrand
let's use a new "sbm" sub-struct to hold SBM-specific state and rename + move applicable definitions, functions, and variables (related to memory block states). While at it: - Drop the "_STATE" part from memory block states - Rename "nb_mb_state" to "mb_count" - "set_mb_state" / "get_mb_state" vs. "mb_set_state" / "mb_get_state" - Don't use lengthy "enum virtio_mem_smb_mb_state", simply use "uint8_t" Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-19-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virito-mem: document Sub Block Mode (SBM)David Hildenbrand
Let's add some documentation for the current mode - Sub Block Mode (SBM) - to prepare for a new mode - Big Block Mode (BBM). Follow-up patches will properly factor out the existing Sub Block Mode (SBM) and implement Big Block Mode (BBM). Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-18-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: generalize handling when memory is getting onlined deferredDavid Hildenbrand
We don't want to add too much memory when it's not getting onlined immediately, to avoid running OOM. Generalize the handling, to avoid making use of memory block states. Use a threshold of 1 GiB for now. Properly adjust the offline size when adding/removing memory. As we are not always protected by a lock when touching the offline size, use an atomic64_t. We don't care about races (e.g., someone offlining memory while we are adding more), only about consistent values. (1 GiB needs a memmap of ~16MiB - which sounds reasonable even for setups with little boot memory and (possibly) one virtio-mem device per node) We don't want to retrigger when onlining is caused immediately by our action (e.g., adding memory which immediately gets onlined), so use a flag to indicate if the workqueue is active and use that as an indicator whether to trigger a retry. This will also be especially relevant for Big Block Mode (BBM), whereby we might re-online memory in case offlining of another memory block failed. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-17-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: don't always trigger the workqueue when offlining memoryDavid Hildenbrand
Let's trigger from offlining code only when we're not allowed to unplug online memory. Handle the other case (memmap possibly freeing up another memory block) when actually removing memory. We now also properly handle the case when removing already offline memory blocks via virtio_mem_mb_remove(). When removing via virtio_mem_remove(), when unloading the driver, virtio_mem_retry() is a NOP and safe to use. While at it, move retry handling when offlining out of virtio_mem_notify_offline(), to share it with Big Block Mode (BBM) soon. This is a preparation for Big Block Mode (BBM), whereby we can see some temporary offlining of memory blocks without actually making progress. Imagine you have a Big Block that spans to Linux memory blocks. Assume the first Linux memory blocks has no unmovable data on it. When we would call offline_and_remove_memory() on the big block, we would 1. Try to offline the first block. Works, notifiers triggered. virtio_mem_retry() called. 2. Try to offline the second block. Does not work. 3. Re-online first block. 4. Exit to main loop, exit workqueue. 5. Retry immediately (due to virtio_mem_retry()), go to 1. The result are endless retries. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-16-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: drop last_mb_idDavid Hildenbrand
No longer used, let's drop it. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-15-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: generalize virtio_mem_overlaps_range()David Hildenbrand
Avoid using memory block ids. While at it, use uint64_t for address/size. This is a preparation for Big Block Mode (BBM). Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-14-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: generalize virtio_mem_owned_mb()David Hildenbrand
Avoid using memory block ids. Rename it to virtio_mem_contains_range(). This is a preparation for Big Block Mode (BBM). Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-13-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: generalize check for added memoryDavid Hildenbrand
Let's check by traversing busy system RAM resources instead, to avoid relying on memory block states. Don't use walk_system_ram_range(), as that works on pages and we want to use the bare addresses we have easily at hand. This is a preparation for Big Block Mode (BBM), which won't have memory block states. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-12-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: retry fake-offlining via alloc_contig_range() on ZONE_MOVABLEDavid Hildenbrand
ZONE_MOVABLE is supposed to give some guarantees, yet, alloc_contig_range() isn't prepared to properly deal with some racy cases properly (e.g., temporary page pinning when exiting processed, PCP). Retry 5 times for now. There is certainly room for improvement in the future. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-11-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: factor out handling of fake-offline pages in memory notifierDavid Hildenbrand
Let's factor out the core pieces and place the implementation next to virtio_mem_fake_offline(). We'll reuse this functionality soon. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-10-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: factor out fake-offlining into virtio_mem_fake_offline()David Hildenbrand
... which now matches virtio_mem_fake_online(). We'll reuse this functionality soon. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-9-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: print debug messages from virtio_mem_send_*_request()David Hildenbrand
Let's move the existing dev_dbg() into the functions, print if something went wrong, and also print for virtio_mem_send_unplug_all_request(). Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-8-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: factor out calculation of the bit number within the subblock bitmapDavid Hildenbrand
The calculation is already complicated enough, let's limit it to one location. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-7-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: use "unsigned long" for nr_pages when fake onlining/offliningDavid Hildenbrand
No harm done, but let's be consistent. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-6-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: drop rc2 in virtio_mem_mb_plug_and_add()David Hildenbrand
We can drop rc2, we don't actually need the value. Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-5-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: simplify MAX_ORDER - 1 / pageblock_order handlingDavid Hildenbrand
Let's use pageblock_nr_pages and MAX_ORDER_NR_PAGES instead where possible to simplify. Add a comment why we have that restriction for now. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-4-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: more precise calculation in virtio_mem_mb_state_prepare_next_mb()David Hildenbrand
We actually need one byte less (next_mb_id is exclusive, first_mb_id is inclusive). While at it, compact the code. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18virtio-mem: determine nid only once using memory_add_physaddr_to_nid()David Hildenbrand
Let's determine the target nid only once in case we have none specified - usually, we'll end up with node 0 either way. Reviewed-by: Wei Yang <richard.weiyang@linux.alibaba.com> Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20201112133815.13332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-18Merge tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "In this release we add the ability to set a 'needsrepair' flag indicating that we /know/ the filesystem requires xfs_repair, but other than that, it's the usual strengthening of metadata validation and miscellaneous cleanups. Summary: - Introduce a "needsrepair" "feature" to flag a filesystem as needing a pass through xfs_repair. This is key to enabling filesystem upgrades (in xfs_db) that require xfs_repair to make minor adjustments to metadata. - Refactor parameter checking of recovered log intent items so that we actually use the same validation code as them that generate the intent items. - Various fixes to online scrub not reacting correctly to directory entries pointing to inodes that cannot be igetted. - Refactor validation helpers for data and rt volume extents. - Refactor XFS_TRANS_DQ_DIRTY out of existence. - Fix a longstanding bug where mounting with "uqnoenforce" would start user quotas in non-enforcing mode but /proc/mounts would display "usrquota", implying that they are being enforced. - Don't flag dax+reflink inodes as corruption since that is a valid (but not fully functional) combination right now. - Clean up raid stripe validation functions. - Refactor the inode allocation code to be more straightforward. - Small prep cleanup for idmapping support. - Get rid of the xfs_buf_t typedef" * tag 'xfs-5.11-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (40 commits) xfs: remove xfs_buf_t typedef fs/xfs: convert comma to semicolon xfs: open code updating i_mode in xfs_set_acl xfs: remove xfs_vn_setattr_nonsize xfs: kill ialloced in xfs_dialloc() xfs: spilt xfs_dialloc() into 2 functions xfs: move xfs_dialloc_roll() into xfs_dialloc() xfs: move on-disk inode allocation out of xfs_ialloc() xfs: introduce xfs_dialloc_roll() xfs: convert noroom, okalloc in xfs_dialloc() to bool xfs: don't catch dax+reflink inodes as corruption in verifier xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks xfs: remove unneeded return value check for *init_cursor() xfs: introduce xfs_validate_stripe_geometry() xfs: show the proper user quota options xfs: remove the unused XFS_B_FSB_OFFSET macro xfs: remove unnecessary null check in xfs_generic_create xfs: directly return if the delta equal to zero xfs: check tp->t_dqinfo value instead of the XFS_TRANS_DQ_DIRTY flag xfs: delete duplicated tp->t_dqinfo null check and allocation ...
2020-12-18Merge tag 'ktest-v5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest Pull ktest updates from Steven Rostedt: "No new features. Just a couple of fixes that I had in my local repository that fixed issues with sending the result emails" * tag 'ktest-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest: ktest.pl: Fix the logic for truncating the size of the log file for email ktest.pl: If size of log is too big to email, email error message
2020-12-18Merge tag 'drm-next-2020-12-18' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull more drm updates from Daniel Vetter: "UAPI Changes: - Only enable char/agp uapi when CONFIG_DRM_LEGACY is set Cross-subsystem Changes: - vma_set_file helper to make vma->vm_file changing less brittle, acked by Andrew Core Changes: - dma-buf heaps improvements - pass full atomic modeset state to driver callbacks - shmem helpers: cached bo by default - cleanups for fbdev, fb-helpers - better docs for drm modes and SCALING_FITLER uapi - ttm: fix dma32 page pool regression Driver Changes: - multi-hop regression fixes for amdgpu, radeon, nouveau - lots of small amdgpu hw enabling fixes (display, pm, ...) - fixes for imx, mcde, meson, some panels, virtio, qxl, i915, all fairly minor - some cleanups for legacy drm/fbdev drivers" * tag 'drm-next-2020-12-18' of git://anongit.freedesktop.org/drm/drm: (117 commits) drm/qxl: don't allocate a dma_address array drm/nouveau: fix multihop when move doesn't work. drm/i915/tgl: Fix REVID macros for TGL to fetch correct stepping drm/i915: Fix mismatch between misplaced vma check and vma insert drm/i915/perf: also include Gen11 in OATAILPTR workaround Revert "drm/i915: re-order if/else ladder for hpd_irq_setup" drm/amdgpu/disply: fix documentation warnings in display manager drm/amdgpu: print mmhub client name for dimgrey_cavefish drm/amdgpu: set mode1 reset as default for dimgrey_cavefish drm/amd/display: Add get_dig_frontend implementation for DCEx drm/radeon: remove h from printk format specifier drm/amdgpu: remove h from printk format specifier drm/amdgpu: Fix spelling mistake "Heterogenous" -> "Heterogeneous" drm/amdgpu: fix regression in vbios reservation handling on headless drm/amdgpu/SRIOV: Extend VF reset request wait period drm/amdkfd: correct amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu log. drm/amd/display: Adding prototype for dccg21_update_dpp_dto() drm/amdgpu: print what method we are using for runtime pm drm/amdgpu: simplify logic in atpx resume handling drm/amdgpu: no need to call pci_ignore_hotplug for _PR3 ...
2020-12-18tools headers UAPI: Update asm-generic/unistd.hArnaldo Carvalho de Melo
Just a comment change, trivial. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18tools headers cpufeatures: Sync with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: e7b6385b01d8e9fb ("x86/cpufeatures: Add Intel SGX hardware bits") That causes only these 'perf bench' objects to rebuild: CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o And addresses these perf build warnings: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18tools headers UAPI: Sync linux/prctl.h with the kernel sourcesArnaldo Carvalho de Melo
To pick a new prctl introduced in: 1446e1df9eb183fd ("kernel: Implement selective syscall userspace redirection") That results in: $ tools/perf/trace/beauty/prctl_option.sh > before $ cp include/uapi/linux/prctl.h tools/include/uapi/linux/prctl.h $ tools/perf/trace/beauty/prctl_option.sh > after $ diff -u before after --- before 2020-12-17 15:00:42.012537367 -0300 +++ after 2020-12-17 15:00:49.832699463 -0300 @@ -53,6 +53,7 @@ [56] = "GET_TAGGED_ADDR_CTRL", [57] = "SET_IO_FLUSHER", [58] = "GET_IO_FLUSHER", + [59] = "SET_SYSCALL_USER_DISPATCH", }; static const char *prctl_set_mm_options[] = { [1] = "START_CODE", $ Now users can do: # perf trace -e syscalls:sys_enter_prctl --filter "option==SET_SYSCALL_USER_DISPATCH" ^C# # trace -v -e syscalls:sys_enter_prctl --filter "option==SET_SYSCALL_USER_DISPATCH" New filter for syscalls:sys_enter_prctl: (option==0x3b) && (common_pid != 5519 && common_pid != 3404) ^C# And also when prctl appears in a session, its options will be translated to the string. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Gabriel Krisman Bertazi <krisman@collabora.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18tools headers UAPI: Sync linux/fscrypt.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes from: 3ceb6543e9cf6ed8 ("fscrypt: remove kernel-internal constants from UAPI header") That don't result in any changes in tooling, just addressing this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18tools headers UAPI: Sync linux/const.h with the kernel headersArnaldo Carvalho de Melo
To pick up the changes in: a85cbe6159ffc973 ("uapi: move constants from <linux/kernel.h> to <linux/const.h>") That causes no changes in tooling, just addresses this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/const.h' differs from latest version at 'include/uapi/linux/const.h' diff -u tools/include/uapi/linux/const.h include/uapi/linux/const.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Petr Vorel <petr.vorel@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18tools arch x86: Sync the msr-index.h copy with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: d205e0f1426e0f99 ("x86/{cpufeatures,msr}: Add Intel SGX Launch Control hardware bits") e7b6385b01d8e9fb ("x86/cpufeatures: Add Intel SGX hardware bits") 43756a298928c9a4 ("powercap: Add AMD Fam17h RAPL support") 298ed2b31f552806 ("x86/msr-index: sort AMD RAPL MSRs by address") 68299a42f8428853 ("x86/mce: Enable additional error logging on certain Intel CPUs") That cause these changes in tooling: $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after $ diff -u before after --- before 2020-12-17 14:45:49.036994450 -0300 +++ after 2020-12-17 14:46:01.654256639 -0300 @@ -22,6 +22,10 @@ [0x00000060] = "LBR_CORE_TO", [0x00000079] = "IA32_UCODE_WRITE", [0x0000008b] = "IA32_UCODE_REV", + [0x0000008C] = "IA32_SGXLEPUBKEYHASH0", + [0x0000008D] = "IA32_SGXLEPUBKEYHASH1", + [0x0000008E] = "IA32_SGXLEPUBKEYHASH2", + [0x0000008F] = "IA32_SGXLEPUBKEYHASH3", [0x0000009b] = "IA32_SMM_MONITOR_CTL", [0x0000009e] = "IA32_SMBASE", [0x000000c1] = "IA32_PERFCTR0", @@ -59,6 +63,7 @@ [0x00000179] = "IA32_MCG_CAP", [0x0000017a] = "IA32_MCG_STATUS", [0x0000017b] = "IA32_MCG_CTL", + [0x0000017f] = "ERROR_CONTROL", [0x00000180] = "IA32_MCG_EAX", [0x00000181] = "IA32_MCG_EBX", [0x00000182] = "IA32_MCG_ECX", @@ -294,6 +299,7 @@ [0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR", [0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC", [0xc0010299 - x86_AMD_V_KVM_MSRs_offset] = "AMD_RAPL_POWER_UNIT", + [0xc001029a - x86_AMD_V_KVM_MSRs_offset] = "AMD_CORE_ENERGY_STATUS", [0xc001029b - x86_AMD_V_KVM_MSRs_offset] = "AMD_PKG_ENERGY_STATUS", [0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL", [0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN", $ Which causes these parts of tools/perf/ to be rebuilt: CC /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o LD /tmp/build/perf/trace/beauty/tracepoints/perf-in.o LD /tmp/build/perf/trace/beauty/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf At some point these should just be tables read by perf on demand. This allows 'perf trace' users to use those strings to translate from the msr ids provided by the msr: tracepoints. This addresses this perf tools build warning: diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h' Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Victor Ding <victording@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-18perf trace beauty: Update copy of linux/socket.h with the kernel sourcesArnaldo Carvalho de Melo
This just triggers the rebuilding of the syscall beautifiers that extract patterns from this file due to this cset: b713c195d5933227 ("net: provide __sys_shutdown_sock() that takes a socket") After updating it: CC /tmp/build/perf/trace/beauty/sockaddr.o Addressing this perf build warning: Warning: Kernel ABI header at 'tools/perf/trace/beauty/include/linux/socket.h' differs from latest version at 'include/linux/socket.h' diff -u tools/perf/trace/beauty/include/linux/socket.h include/linux/socket.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>