summaryrefslogtreecommitdiff
path: root/arch/um
AgeCommit message (Collapse)Author
2024-07-04um: line: always fill *error_out in setup_one_line()Johannes Berg
The pointer isn't initialized by callers, but I have encountered cases where it's still printed; initialize it in all possible cases in setup_one_line(). Link: https://patch.msgid.link/20240703172235.ad863568b55f.Iaa1eba4db8265d7715ba71d5f6bb8c7ff63d27e9@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Enable preemption in UMLAnton Ivanov
Since userspace state is saved in the MM process, kernel using FPU still doesn't really need to do anything, so this really is as simple as enabling preemption. The irq critical section in sigio_handler() needs preempt_disable()/preempt_enable(). Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240702102549.d2fcea450854.I12f5a53d80ec1e425e66ef272b1e95cb523b608e@changeid [rebase, remove FPU save/restore, fix x86/um Makefile, rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: refactor TLB update handlingBenjamin Berg
Conceptually, we want the memory mappings to always be up to date and represent whatever is in the TLB. To ensure that, we need to sync them over in the userspace case and for the kernel we need to process the mappings. The kernel will call flush_tlb_* if page table entries that were valid before become invalid. Unfortunately, this is not the case if entries are added. As such, change both flush_tlb_* and set_ptes to track the memory range that has to be synchronized. For the kernel, we need to execute a flush_tlb_kern_* immediately but we can wait for the first page fault in case of set_ptes. For userspace in contrast we only store that a range of memory needs to be synced and do so whenever we switch to that process. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: simplify and consolidate TLB updatesBenjamin Berg
The HVC update was mostly used to compress consecutive calls into one. This is mostly relevant for userspace where it is already handled by the syscall stub code. Simplify the whole logic and consolidate it for both kernel and userspace. This does remove the sequential syscall compression for the kernel, however that shouldn't be the main factor in most runs. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-12-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove force_flush_all from fork_handlerBenjamin Berg
There should be no need for this. It may be that this used to work around another issue where after a clone the MM was in a bad state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Do not flush MM in flush_threadBenjamin Berg
There should be no need to flush the memory in flush_thread. Doing this likely worked around some issue where memory was still incorrectly mapped when creating or cloning an MM. With the removal of the special clone path, that isn't relevant anymore. However, add the flush into MM initialization so that any new userspace MM is guaranteed to be clean. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-10-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Delay flushing syscalls until the thread is restartedBenjamin Berg
As running the syscalls is expensive due to context switches, we should do so as late as possible in case more syscalls need to be queued later on. This will also benefit a later move to a SECCOMP enabled userspace as in that case the need for extra context switches is removed entirely. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove copy_context_skas0Benjamin Berg
The kernel flushes the memory ranges anyway for CoW and does not assume that the userspace process has anything set up already. So, start with a fresh process for the new mm context. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: remove LDT supportBenjamin Berg
The current LDT code has a few issues that mean it should be redone in a different way once we always start with a fresh MM even when cloning. In a new and better world, the kernel would just ensure its own LDT is clear at startup. At that point, all that is needed is a simple function to populate the LDT from another MM in arch_dup_mmap combined with some tracking of the installed LDT entries for each MM. Note that the old implementation was even incorrect with regard to reading, as it copied out the LDT entries in the internal format rather than converting them to the userspace structure. Removal should be fine as the LDT is not used for thread-local storage anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-7-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: compress memory related stub syscalls while adding themBenjamin Berg
To keep the number of syscalls that the stub has to do lower, compress two consecutive syscalls of the same type if the second is just a continuation of the first. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Rework syscall handlingBenjamin Berg
Rework syscall handling to be platform independent. Also create a clean split between queueing of syscalls and flushing them out, removing the need to keep state in the code that triggers the syscalls. The code adds syscall_data_len to the global mm_id structure. This will be used later to allow surrounding code to track whether syscalls still need to run and if errors occurred. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Create signal stack memory assignment in stub_dataBenjamin Berg
When we switch to use seccomp, we need both the signal stack and other data (i.e. syscall information) to co-exist in the stub data. To facilitate this, start by defining separate memory areas for the stack and syscall data. This moves the signal stack onto a new page as the memory area is not sufficient to hold both signal stack and syscall information. Only change the signal stack setup for now, as the syscall code will be reworked later. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Remove stub-data.h include from common-offsets.hBenjamin Berg
Further commits will require values from common-offsets.h inside stub-data.h. Resolve the possible circular dependency and simply use offsetof() inside stub_32.h and stub_64.h. Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net> Link: https://patch.msgid.link/20240703134536.1161108-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: time-travel: fix signal blocking race/hangJohannes Berg
When signals are hard-blocked in order to do time-travel socket processing, we set signals_blocked and then handle SIGIO signals by setting the SIGIO bit in signals_pending. When unblocking, we first set signals_blocked to 0, and then handle all pending signals. We have to set it first, so that we can again properly block/unblock inside the unblock, if the time-travel handlers need to be processed. Unfortunately, this is racy. We can get into this situation: // signals_pending = SIGIO_MASK unblock_signals_hard() signals_blocked = 0; if (signals_pending && signals_enabled) { block_signals(); unblock_signals() ... sig_handler_common(SIGIO, NULL, NULL); sigio_handler() ... sigio_reg_handler() irq_do_timetravel_handler() reg->timetravel_handler() == vu_req_interrupt_comm_handler() vu_req_read_message() vhost_user_recv_req() vhost_user_recv() vhost_user_recv_header() // reads 12 bytes header of // 20 bytes message <-- receive SIGIO here <-- sig_handler() int enabled = signals_enabled; // 1 if ((signals_blocked || !enabled) && (sig == SIGIO)) { if (!signals_blocked && time_travel_mode == TT_MODE_EXTERNAL) sigio_run_timetravel_handlers() _sigio_handler() sigio_reg_handler() ... as above ... vhost_user_recv_header() // reads 8 bytes that were message payload // as if it were header - but aborts since // it then gets -EAGAIN ... --> end signal handler --> // continue in vhost_user_recv() // full_read() for 8 bytes payload busy loops // entire process hangs here Conceptually, to fix this, we need to ensure that the signal handler cannot run while we hard-unblock signals. The thing that makes this more complex is that we can be doing hard-block/unblock while unblocking. Introduce a new signals_blocked_pending variable that we can keep at non-zero as long as pending signals are being processed, then we only need to ensure it's decremented safely and the signal handler will only increment it if it's already non-zero (or signals_blocked is set, of course.) Note also that only the outermost call to hard-unblock is allowed to decrement signals_blocked_pending, since it could otherwise reach zero in an inner call, and leave the same race happening if the timetravel_handler loops, but that's basically required of it. Fixes: d6b399a0e02a ("um: time-travel/signals: fix ndelay() in interrupt") Link: https://patch.msgid.link/20240703110144.28034-2-johannes@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: time-travel: remove time_exit()Johannes Berg
This function is unused and unneeded, remove it. Link: https://patch.msgid.link/20240703130105.02b3a974acb7.I7264821f7cfa17ea713b7a3e4787aa41a3107d01@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: harddog: add missing MODULE_DESCRIPTION() macroJeff Johnson
With ARCH=um, make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/um/drivers/harddog.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240702-md-um-arch-um-drivers-v1-1-79e4f50b5bab@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: add shared memory optimisation for time-travel=extJohannes Berg
With external time travel, a LOT of message can end up being exchanged on the socket, taking a significant amount of time just to do that. Add a new shared memory optimisation to that, where a number of changes are made: - the controller sends a client ID and a shared memory FD (and a logging FD we don't use) in the ACK message to the initial START - the shared memory holds the current time and the free_until value, so that there's no need to exchange messages for that - if the client that's running has shared memory support, any client (the running one included) can request the next time it wants to run inside the shared memory, rather than sending a message, by also updating the free_until value - when shared memory is enabled, RUN/WAIT messages no longer have an ACK, further cutting down on messages Together, this can reduce the number of messages very significantly, and reduce overall test/simulation run time. Co-developed-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Link: https://patch.msgid.link/20240702192118.6ad0a083f574.Ie41206c8ce4507fe26b991937f47e86c24ca7a31@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: add mmap/mremap OS callsJohannes Berg
For the upcoming shared-memory time-travel external optimisations, we need to be able to mmap/mremap. Add the necessary OS calls. Link: https://patch.msgid.link/20240702192118.ca4472963638.Ic2da1d3a983fe57340c1b693badfa9c5bd2d8c61@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: generalize os_rcv_fdJohannes Berg
Change os_rcv_fd() to os_rcv_fd_msg() that can more generally receive any number of FDs in any kind of message. Link: https://patch.msgid.link/20240702192118.40b78b2bfe4e.Ic6ec12d72630e5bcae1e597d6bd5c6f29f441563@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: time-travel: support time-travel protocol broadcast messagesMordechay Goodstein
Add a message type to the time-travel protocol to broadcast a small (64-bit) value to all participants in a simulation. The main use case is to have an identical message come to all participants in a simulation, e.g. to separate out logs for different tests running in a single simulation. Down in the guts of time_travel_handle_message() we can't use printk() and not even printk_deferred(), so just store the message and print it at the start of the userspace() function. Unfortunately this means that other prints in the kernel can actually bypass the message, but in most cases where this is used, for example to separate test logs, userspace will be involved. Also, even if we could use printk_deferred(), we'd still need to flush it out in the userspace() function since otherwise userspace messages might cross it. As a result, this is a reasonable compromise, there's no need to have any core changes and it solves the main use case we have for it. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Link: https://patch.msgid.link/20240702192118.c4093bc5b15e.I2ca8d006b67feeb866ac2017af7b741c9e06445a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: enable UBSANJohannes Berg
We can select ARCH_HAS_UBSAN, it works just fine. It had been enabled and we even used it, but then commit 890a64810d59 ("ubsan: Restore dependency on ARCH_HAS_UBSAN") (correctly) disabled it again, enable ARCH_HAS_UBSAN to get it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20240701220034.995eb04d656d.Ia29fe091b207fe66b5e26298c1e427ebcf131642@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um/mm: remove redundant assignment of max_low_pfnWei Yang
Current calculation of max_low_pfn is introduced in commit af84eab20891 ("[PATCH] uml: fix LVM crash"). It is intended to set max_low_pfn to the same value as max_pfn. But I am not sure why the max_pfn is set to totalram_pages, which represents the number of usable pages in system instead of an absolute page frame number. (The change history stops there.) While we have already calculate it in setup_physmem(), so not necessary to do it again. Also this would help changing totalram_pages accounting, since we plan to move the accounting into __free_pages_core(). With this change, totalram_pages may not represent the total usable pages at this point, since some pages would be deferred initialized. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Jeff Dike <jdike@linux.intel.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Alasdair G Kergon <agk@redhat.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Mike Rapoport (IBM) <rppt@kernel.org> CC: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://patch.msgid.link/20240615034150.2958-1-richard.weiyang@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03arch: um: rust: Add i386 support for RustDavid Gow
At present, Rust in the kernel only supports 64-bit x86, so UML has followed suit. However, it's significantly easier to support 32-bit i386 on UML than on bare metal, as UML does not use the -mregparm option (which alters the ABI), which is not yet supported by rustc[1]. Add support for CONFIG_RUST on um/i386, by adding a new target config to generate_rust_target, and replacing various checks on CONFIG_X86_64 to also support CONFIG_X86_32. We still use generate_rust_target, rather than a built-in rustc target, in order to match x86_64, provide a future place for -mregparm, and more easily disable floating point instructions. With these changes, the KUnit tests pass with: kunit.py run --make_options LLVM=1 --kconfig_add CONFIG_RUST=y --kconfig_add CONFIG_64BIT=n --kconfig_add CONFIG_FORTIFY_SOURCE=n An earlier version of these changes was proposed on the Rust-for-Linux github[2]. [1]: https://github.com/rust-lang/rust/issues/116972 [2]: https://github.com/Rust-for-Linux/linux/pull/966 Signed-off-by: David Gow <davidgow@google.com> Link: https://patch.msgid.link/20240604224052.3138504-1-davidgow@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Remove /proc/sysemu support codeTiwei Bie
Currently /proc/sysemu will never be registered, as sysemu_supported is initialized to zero implicitly and no code updates it. And there is also nothing to configure via sysemu in UML anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240527134024.1539848-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Remove unused ncpus variableTiwei Bie
It's no longer used. And uml_ncpus_setup doesn't exist anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240527134024.1539848-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03ubd: Remove unused mutex 'ubd_mutex'Dr. David Alan Gilbert
Commit fb5d1d389c9e ("ubd: open the backing files in ubd_add") removed the last use of ubd_mutex. Remove it. Build and kernel startup test only. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20240505001508.255096-1-linux@treblig.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: time-travel: fix time-travel-start optionJohannes Berg
We need to have the = as part of the option so that the value can be parsed properly. Also document that it must be given in nanoseconds, not seconds. Fixes: 065038706f77 ("um: Support time travel mode") Link: https://patch.msgid.link/20240417102744.14b9a9d4eba0.Ib22e9136513126b2099d932650f55f193120cd97@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Select HAS_IOREMAP for UML_IOMEM_EMULATIONNiklas Schnelle
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. UML supports these via its UML_IOMEM_EMULATION so let that select HAS_IOPORT and also reflect this in NO_IOPORT_MAP. Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Link: https://patch.msgid.link/20240403124300.65379-2-schnelle@linux.ibm.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Remove obsolete pcap driverAnton Ivanov
Remove the pcap driver in UML. It is obsolete. It does not build on recent systems due to changes in libpcap and its dependencies. The vector driver's raw transport in UML provides identical functionality. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240328132424.376456-1-anton.ivanov@cambridgegreys.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: chan: use blocking IO for console output for time-travelBenjamin Berg
When in time-travel mode (infinite-cpu or external) time should not pass for writing to the console. As such, it makes sense to put the FD for the output side into blocking mode and simply let any write to it hang. If we did not do this, then time could pass waiting for the console to become writable again. This is not desirable as it has random effects on the clock between runs. Implement this by duplicating the FD if output is active in a relevant mode and setting the duplicate to be blocking. This avoids changing the input channel to be blocking should it exists. After this, use the blocking FD for all write operations and do not allocate an IRQ it is set. Without time-travel mode fd_out will always match fd_in and IRQs are registered. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: chan_user: retry partial writesBenjamin Berg
In the next commit, we are going to set the output FD to be blocking. Once that is done, the write() may be short if an interrupt happens while it is writing out data. As such, to properly catch an EINTR error, we need to retry the write. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: chan_user: catch EINTR when reading and writingBenjamin Berg
If the read/write function returns an error then we expect to see an event/IRQ later on. However, this will only happen after an EAGAIN as we are using edge based event triggering. As such, EINTR needs to be caught should it happen. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-2-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: irqs: process outstanding IRQs when unblocking signalsBenjamin Berg
When in time-travel mode, the eventfd events are read even when signals are blocked as SIGIO still needs to be processed. In this case, the event is cleared on the eventfd but the IRQ still needs to be fired later. We did already ensure that the SIGIO handler is run again. However, the FDs are configured to be level triggered, so that eventfd will not notify again. As such, add some logic to mark the IRQ as pending and process it at the next opportunity. To avoid duplication, reuse the logic used for the suspend/resume case. This does not really change anything except for delaying running the IRQs with timetravel_handler at a slightly later point in time (and possibly running non-timetravel IRQs that shouldn't happen earlier). While at it, move marking as pending into irq_event_handler as that is the more logical place for it to happen. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20231018123643.1255813-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-06-19block: move the nonrot flag to queue_limitsChristoph Hellwig
Move the nonrot flag into the queue_limits feature field so that it can be set atomically with the queue frozen. Use the chance to switch to defaulting to non-rotational and require the driver to opt into rotational, which matches the polarity of the sysfs interface. For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new rotational flag is not set as they clearly are not rotational despite this being a behavior change. There are some other drivers that unconditionally set the rotational flag to keep the existing behavior as they arguably can be used on rotational devices even if that is probably not their main use today (e.g. virtio_blk and drbd). The flag is automatically inherited in blk_stack_limits matching the existing behavior in dm and md. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-19block: move cache control settings out of queue->flagsChristoph Hellwig
Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-17_PATCH_19_23_um_virt_pci_Use_irq_domain_instantiate_Herve Codina
um_pci_init() uses __irq_domain_add(). With the introduction of irq_domain_instantiate(), __irq_domain_add() becomes obsolete. In order to fully remove __irq_domain_add(), use directly irq_domain_instantiate(). [ tglx: Fixup struct initializer ] Signed-off-by: Herve Codina <herve.codina@bootlin.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240614173232.1184015-20-herve.codina@bootlin.com
2024-06-14block: add special APIs for run-time disabling of discard and friendsChristoph Hellwig
A few drivers optimistically try to support discard, write zeroes and secure erase and disable the features from the I/O completion handler if the hardware can't support them. This disable can't be done using the atomic queue limits API because the I/O completion handlers can't take sleeping locks or freeze the queue. Keep the existing clearing of the relevant field to zero, but replace the old blk_queue_max_* APIs with new disable APIs that force the value to 0. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14ubd: untagle discard vs write zeroes not support handlingChristoph Hellwig
Discard and Write Zeroes are different operation and implemented by different fallocate opcodes for ubd. If one fails the other one can work and vice versa. Split the code to disable the operations in ubd_handler to only disable the operation that actually failed. Fixes: 50109b5a03b4 ("um: Add support for DISCARD in the UBD Driver") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-14ubd: refactor the interrupt handlerChristoph Hellwig
Instead of a separate handler function that leaves no work in the interrupt hanler itself, split out a per-request end I/O helper and clean up the coding style and variable naming while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20240531074837.1648501-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-05-25Merge tag 'uml-for-linus-6.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Fixes for -Wmissing-prototypes warnings and further cleanup - Remove callback returning void from rtc and virtio drivers - Fix bash location * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits) um: virtio_uml: Convert to platform remove callback returning void um: rtc: Convert to platform remove callback returning void um: Remove unused do_get_thread_area function um: Fix -Wmissing-prototypes warnings for __vdso_* um: Add an internal header shared among the user code um: Fix the declaration of kasan_map_memory um: Fix the -Wmissing-prototypes warning for get_thread_reg um: Fix the -Wmissing-prototypes warning for __switch_mm um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn um: Stop tracking host PID in cpu_tasks um: process: remove unused 'n' variable um: vector: remove unused len variable/calculation um: vector: fix bpfflash parameter evaluation um: slirp: remove set but unused variable 'pid' um: signal: move pid variable where needed um: Makefile: use bash from the environment um: Add winch to winch_handlers before registering winch IRQ um: Fix -Wmissing-prototypes warnings for __warp_* and foo um: Fix -Wmissing-prototypes warnings for text_poke* um: Move declarations to proper headers ...
2024-05-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-net is finally supported in vduse - virtio (balloon and mem) interaction with suspend is improved - vhost-scsi now handles signals better/faster And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (48 commits) virtio-pci: Check if is_avq is NULL virtio: delete vq in vp_find_vqs_msix() when request_irq() fails MAINTAINERS: add Eugenio Pérez as reviewer vhost-vdpa: Remove usage of the deprecated ida_simple_xx() API vp_vdpa: don't allocate unused msix vectors sound: virtio: drop owner assignment fuse: virtio: drop owner assignment scsi: virtio: drop owner assignment rpmsg: virtio: drop owner assignment nvdimm: virtio_pmem: drop owner assignment wifi: mac80211_hwsim: drop owner assignment vsock/virtio: drop owner assignment net: 9p: virtio: drop owner assignment net: virtio: drop owner assignment net: caif: virtio: drop owner assignment misc: nsm: drop owner assignment iommu: virtio: drop owner assignment drm/virtio: drop owner assignment gpio: virtio: drop owner assignment firmware: arm_scmi: virtio: drop owner assignment ...
2024-05-22um: virt-pci: drop owner assignmentKrzysztof Kozlowski
virtio core already sets the .owner, so driver does not need to. Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-Id: <20240331-module-owner-virtio-v2-5-98f04bfaf46a@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-20Merge tag 'asm-generic-6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "These are a few cross-architecture cleanup patches: - separate out fbdev support from the asm/video.h contents that may be used by either the old fbdev drivers or the newer drm display code (Thomas Zimmermann) - cleanups for the generic bitops code and asm-generic/bug.h (Thorsten Blum) - remove the orphaned include/asm-generic/page.h header that used to be included by long-removed mmu-less architectures (me)" * tag 'asm-generic-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: arch: Fix name collision with ACPI's video.o bug: Improve comment asm-generic: remove unused asm-generic/page.h arch: Rename fbdev header and source files arch: Remove struct fb_info from video helpers arch: Select fbdev helpers with CONFIG_VIDEO bitops: Change function return types from long to int
2024-05-19Merge tag 'mm-stable-2024-05-17-19-19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: "The usual shower of singleton fixes and minor series all over MM, documented (hopefully adequately) in the respective changelogs. Notable series include: - Lucas Stach has provided some page-mapping cleanup/consolidation/ maintainability work in the series "mm/treewide: Remove pXd_huge() API". - In the series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's MPOL_PREFERRED_MANY mode, yielding almost doubled performance in one test. - In their series "Memory allocation profiling" Kent Overstreet and Suren Baghdasaryan have contributed a means of determining (via /proc/allocinfo) whereabouts in the kernel memory is being allocated: number of calls and amount of memory. - Matthew Wilcox has provided the series "Various significant MM patches" which does a number of rather unrelated things, but in largely similar code sites. - In his series "mm: page_alloc: freelist migratetype hygiene" Johannes Weiner has fixed the page allocator's handling of migratetype requests, with resulting improvements in compaction efficiency. - In the series "make the hugetlb migration strategy consistent" Baolin Wang has fixed a hugetlb migration issue, which should improve hugetlb allocation reliability. - Liu Shixin has hit an I/O meltdown caused by readahead in a memory-tight memcg. Addressed in the series "Fix I/O high when memory almost met memcg limit". - In the series "mm/filemap: optimize folio adding and splitting" Kairui Song has optimized pagecache insertion, yielding ~10% performance improvement in one test. - Baoquan He has cleaned up and consolidated the early zone initialization code in the series "mm/mm_init.c: refactor free_area_init_core()". - Baoquan has also redone some MM initializatio code in the series "mm/init: minor clean up and improvement". - MM helper cleanups from Christoph Hellwig in his series "remove follow_pfn". - More cleanups from Matthew Wilcox in the series "Various page->flags cleanups". - Vlastimil Babka has contributed maintainability improvements in the series "memcg_kmem hooks refactoring". - More folio conversions and cleanups in Matthew Wilcox's series: "Convert huge_zero_page to huge_zero_folio" "khugepaged folio conversions" "Remove page_idle and page_young wrappers" "Use folio APIs in procfs" "Clean up __folio_put()" "Some cleanups for memory-failure" "Remove page_mapping()" "More folio compat code removal" - David Hildenbrand chipped in with "fs/proc/task_mmu: convert hugetlb functions to work on folis". - Code consolidation and cleanup work related to GUP's handling of hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2". - Rick Edgecombe has developed some fixes to stack guard gaps in the series "Cover a guard gap corner case". - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the series "mm/ksm: fix ksm exec support for prctl". - Baolin Wang has implemented NUMA balancing for multi-size THPs. This is a simple first-cut implementation for now. The series is "support multi-size THP numa balancing". - Cleanups to vma handling helper functions from Matthew Wilcox in the series "Unify vma_address and vma_pgoff_address". - Some selftests maintenance work from Dev Jain in the series "selftests/mm: mremap_test: Optimizations and style fixes". - Improvements to the swapping of multi-size THPs from Ryan Roberts in the series "Swap-out mTHP without splitting". - Kefeng Wang has significantly optimized the handling of arm64's permission page faults in the series "arch/mm/fault: accelerate pagefault when badaccess" "mm: remove arch's private VM_FAULT_BADMAP/BADACCESS" - GUP cleanups from David Hildenbrand in "mm/gup: consistently call it GUP-fast". - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault path to use struct vm_fault". - selftests build fixes from John Hubbard in the series "Fix selftests/mm build without requiring "make headers"". - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the series "Improved Memory Tier Creation for CPUless NUMA Nodes". Fixes the initialization code so that migration between different memory types works as intended. - David Hildenbrand has improved follow_pte() and fixed an errant driver in the series "mm: follow_pte() improvements and acrn follow_pte() fixes". - David also did some cleanup work on large folio mapcounts in his series "mm: mapcount for large folios + page_mapcount() cleanups". - Folio conversions in KSM in Alex Shi's series "transfer page to folio in KSM". - Barry Song has added some sysfs stats for monitoring multi-size THP's in the series "mm: add per-order mTHP alloc and swpout counters". - Some zswap cleanups from Yosry Ahmed in the series "zswap same-filled and limit checking cleanups". - Matthew Wilcox has been looking at buffer_head code and found the documentation to be lacking. The series is "Improve buffer head documentation". - Multi-size THPs get more work, this time from Lance Yang. His series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free" optimizes the freeing of these things. - Kemeng Shi has added more userspace-visible writeback instrumentation in the series "Improve visibility of writeback". - Kemeng Shi then sent some maintenance work on top in the series "Fix and cleanups to page-writeback". - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in the series "Improve anon_vma scalability for anon VMAs". Intel's test bot reported an improbable 3x improvement in one test. - SeongJae Park adds some DAMON feature work in the series "mm/damon: add a DAMOS filter type for page granularity access recheck" "selftests/damon: add DAMOS quota goal test" - Also some maintenance work in the series "mm/damon/paddr: simplify page level access re-check for pageout" "mm/damon: misc fixes and improvements" - David Hildenbrand has disabled some known-to-fail selftests ni the series "selftests: mm: cow: flag vmsplice() hugetlb tests as XFAIL". - memcg metadata storage optimizations from Shakeel Butt in "memcg: reduce memory consumption by memcg stats". - DAX fixes and maintenance work from Vishal Verma in the series "dax/bus.c: Fixups for dax-bus locking"" * tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits) memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault selftests: cgroup: add tests to verify the zswap writeback path mm: memcg: make alloc_mem_cgroup_per_node_info() return bool mm/damon/core: fix return value from damos_wmark_metric_value mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED selftests: cgroup: remove redundant enabling of memory controller Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT Docs/mm/damon/design: use a list for supported filters Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file selftests/damon: classify tests for functionalities and regressions selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None' selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts selftests/damon/_damon_sysfs: check errors from nr_schemes file reads mm/damon/core: initialize ->esz_bp from damos_quota_init_priv() selftests/damon: add a test for DAMOS quota goal ...
2024-05-18Merge tag 'kbuild-v6.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Avoid 'constexpr', which is a keyword in C23 - Allow 'dtbs_check' and 'dt_compatible_check' run independently of 'dt_binding_check' - Fix weak references to avoid GOT entries in position-independent code generation - Convert the last use of 'optional' property in arch/sh/Kconfig - Remove support for the 'optional' property in Kconfig - Remove support for Clang's ThinLTO caching, which does not work with the .incbin directive - Change the semantics of $(src) so it always points to the source directory, which fixes Makefile inconsistencies between upstream and downstream - Fix 'make tar-pkg' for RISC-V to produce a consistent package - Provide reasonable default coverage for objtool, sanitizers, and profilers - Remove redundant OBJECT_FILES_NON_STANDARD, KASAN_SANITIZE, etc. - Remove the last use of tristate choice in drivers/rapidio/Kconfig - Various cleanups and fixes in Kconfig * tag 'kbuild-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (46 commits) kconfig: use sym_get_choice_menu() in sym_check_prop() rapidio: remove choice for enumeration kconfig: lxdialog: remove initialization with A_NORMAL kconfig: m/nconf: merge two item_add_str() calls kconfig: m/nconf: remove dead code to display value of bool choice kconfig: m/nconf: remove dead code to display children of choice members kconfig: gconf: show checkbox for choice correctly kbuild: use GCOV_PROFILE and KCSAN_SANITIZE in scripts/Makefile.modfinal Makefile: remove redundant tool coverage variables kbuild: provide reasonable defaults for tool coverage modules: Drop the .export_symbol section from the final modules kconfig: use menu_list_for_each_sym() in sym_check_choice_deps() kconfig: use sym_get_choice_menu() in conf_write_defconfig() kconfig: add sym_get_choice_menu() helper kconfig: turn defaults and additional prompt for choice members into error kconfig: turn missing prompt for choice members into error kconfig: turn conf_choice() into void function kconfig: use linked list in sym_set_changed() kconfig: gconf: use MENU_CHANGED instead of SYMBOL_CHANGED kconfig: gconf: remove debug code ...
2024-05-10kbuild: use $(src) instead of $(srctree)/$(src) for source directoryMasahiro Yamada
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-05-03arch: Rename fbdev header and source filesThomas Zimmermann
The per-architecture fbdev code has no dependencies on fbdev and can be used for any video-related subsystem. Rename the files to 'video'. Use video-sti.c on parisc as the source file depends on CONFIG_STI_CORE. On arc, arm, arm64, sh, and um the asm header file is an empty wrapper around the file in asm-generic. Let Kbuild generate the file. The build system does this automatically. Only um needs to generate video.h explicitly, so that it overrides the host architecture's header. The latter would otherwise interfere with the build. Further update all includes statements, include guards, and Makefiles. Also update a few strings and comments to refer to video instead of fbdev. v3: - arc, arm, arm64, sh: generate asm header via build system (Sam, Helge, Arnd) - um: rename fb.h to video.h - fix typo in commit message (Sam) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-04-30um: virtio_uml: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-04-30um: rtc: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new(), which already returns void. Eventually after all drivers are converted, .remove_new() will be renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-04-30um: Add an internal header shared among the user codeTiwei Bie
Move relevant declarations to this header. This will address below -Wmissing-prototypes warnings: arch/um/os-Linux/elf_aux.c:26:13: warning: no previous prototype for ‘scan_elf_aux’ [-Wmissing-prototypes] arch/um/os-Linux/mem.c:213:13: warning: no previous prototype for ‘check_tmpexec’ [-Wmissing-prototypes] arch/um/os-Linux/skas/process.c:107:6: warning: no previous prototype for ‘wait_stub_done’ [-Wmissing-prototypes] Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>