path: root/arch/microblaze
AgeCommit message (Collapse)Author
2021-05-17quota: Disable quotactl_path syscallJan Kara
In commit fa8b90070a80 ("quota: wire up quotactl_path") we have wired up new quotactl_path syscall. However some people in LWN discussion have objected that the path based syscall is missing dirfd and flags argument which is mostly standard for contemporary path based syscalls. Indeed they have a point and after a discussion with Christian Brauner and Sascha Hauer I've decided to disable the syscall for now and update its API. Since there is no userspace currently using that syscall and it hasn't been released in any major release, we should be fine. CC: Christian Brauner <> CC: Sascha Hauer <> Link: Signed-off-by: Jan Kara <>
2021-05-03Merge tag 'trace-v5.13' of ↵Linus Torvalds
git:// Pull tracing updates from Steven Rostedt: "New feature: - A new "func-no-repeats" option in tracefs/options directory. When set the function tracer will detect if the current function being traced is the same as the previous one, and instead of recording it, it will keep track of the number of times that the function is repeated in a row. And when another function is recorded, it will write a new event that shows the function that repeated, the number of times it repeated and the time stamp of when the last repeated function occurred. Enhancements: - In order to implement the above "func-no-repeats" option, the ring buffer timestamp can now give the accurate timestamp of the event as it is being recorded, instead of having to record an absolute timestamp for all events. This helps the histogram code which no longer needs to waste ring buffer space. - New validation logic to make sure all trace events that access dereferenced pointers do so in a safe way, and will warn otherwise. Fixes: - No longer limit the PIDs of tasks that are recorded for "saved_cmdlines" to PID_MAX_DEFAULT (32768), as systemd now allows for a much larger range. This caused the mapping of PIDs to the task names to be dropped for all tasks with a PID greater than 32768. - Change trace_clock_global() to never block. This caused a deadlock. Clean ups: - Typos, prototype fixes, and removing of duplicate or unused code. - Better management of ftrace_page allocations" * tag 'trace-v5.13' of git:// (32 commits) tracing: Restructure trace_clock_global() to never block tracing: Map all PIDs to command lines ftrace: Reuse the output of the function tracer for func_repeats tracing: Add "func_no_repeats" option for function tracing tracing: Unify the logic for function tracing options tracing: Add method for recording "func_repeats" events tracing: Add "last_func_repeats" to struct trace_array tracing: Define new ftrace event "func_repeats" tracing: Define static void trace_print_time() ftrace: Simplify the calculation of page number for ftrace_page->records some more ftrace: Store the order of pages allocated in ftrace_page tracing: Remove unused argument from "ring_buffer_time_stamp() tracing: Remove duplicate struct declaration in trace_events.h tracing: Update create_system_filter() kernel-doc comment tracing: A minor cleanup for create_system_filter() kernel: trace: Mundane typo fixes in the file trace_events_filter.c tracing: Fix various typos in comments scripts/ Make vim and emacs indent the same scripts/ Make indent spacing consistent tracing: Add a verifier to check string pointers for trace events ...
2021-05-01Merge tag 'landlock_v34' of ↵Linus Torvalds
git:// Pull Landlock LSM from James Morris: "Add Landlock, a new LSM from Mickaël Salaün. Briefly, Landlock provides for unprivileged application sandboxing. From Mickaël's cover letter: "The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]" The cover letter and v34 posting is here: See also: This code has had extensive design discussion and review over several years" Link: [1] Link: [2] * tag 'landlock_v34' of git:// landlock: Enable user space to infer supported features landlock: Add user and kernel documentation samples/landlock: Add a sandbox manager example selftests/landlock: Add user space tests landlock: Add syscall implementations arch: Wire up Landlock syscalls fs,security: Add sb_delete hook landlock: Support filesystem access-control LSM: Infrastructure management of the superblock landlock: Add ptrace restrictions landlock: Set up the security framework and manage credentials landlock: Add ruleset and domain management landlock: Add object management
2021-04-30mm: move mem_init_print_info() into mm_init()Kefeng Wang
mem_init_print_info() is called in mem_init() on each architecture, and pass NULL argument, so using void argument and move it into mm_init(). Link: Signed-off-by: Kefeng Wang <> Acked-by: Dave Hansen <> [x86] Reviewed-by: Christophe Leroy <> [powerpc] Acked-by: David Hildenbrand <> Tested-by: Anatoly Pugachev <> [sparc64] Acked-by: Russell King <> [arm] Acked-by: Mike Rapoport <> Cc: Catalin Marinas <> Cc: Richard Henderson <> Cc: Guo Ren <> Cc: Yoshinori Sato <> Cc: Huacai Chen <> Cc: Jonas Bonn <> Cc: Palmer Dabbelt <> Cc: Heiko Carstens <> Cc: "David S. Miller" <> Cc: "Peter Zijlstra" <> Cc: Ingo Molnar <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2021-04-29Merge tag 'microblaze-v5.13' of git:// Torvalds
Pull Microblaze updates from Michal Simek: "No new features, just about cleaning up some code and moving to generic syscall solution used by other architectures: - Switch to generic syscall scripts - Some small fixes" * tag 'microblaze-v5.13' of git:// microblaze: add 'fallthrough' to memcpy/memset/memmove microblaze: Fix a typo microblaze: tag highmem_setup() with __meminit microblaze: syscalls: switch to generic microblaze: syscalls: switch to generic
2021-04-29Merge tag 'for_v5.13-rc1' of ↵Linus Torvalds
git:// Pull quota, ext2, reiserfs updates from Jan Kara: - support for path (instead of device) based quotactl syscall (quotactl_path(2)) - ext2 conversion to kmap_local() - other minor cleanups & fixes * tag 'for_v5.13-rc1' of git:// fs/reiserfs/journal.c: delete useless variables fs/ext2: Replace kmap() with kmap_local_page() ext2: Match up ext2_put_page() with ext2_dotdot() and ext2_find_entry() fs/ext2/: fix misspellings using codespell tool quota: report warning limits for realtime space quotas quota: wire up quotactl_path quota: Add mountpath based quota support
2021-04-22arch: Wire up Landlock syscallsMickaël Salaün
Wire up the following system calls for all architectures: * landlock_create_ruleset(2) * landlock_add_rule(2) * landlock_restrict_self(2) Cc: Arnd Bergmann <> Cc: James Morris <> Cc: Jann Horn <> Cc: Kees Cook <> Cc: Serge E. Hallyn <> Signed-off-by: Mickaël Salaün <> Link: Signed-off-by: James Morris <>
2021-04-22microblaze: add 'fallthrough' to memcpy/memset/memmoveRandy Dunlap
Fix "fallthrough" warnings in microblaze memcpy/memset/memmove library functions. CC arch/microblaze/lib/memcpy.o ../arch/microblaze/lib/memcpy.c: In function 'memcpy': ../arch/microblaze/lib/memcpy.c:70:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 70 | --c; ../arch/microblaze/lib/memcpy.c:71:3: note: here 71 | case 2: ../arch/microblaze/lib/memcpy.c:73:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 73 | --c; ../arch/microblaze/lib/memcpy.c:74:3: note: here 74 | case 3: ../arch/microblaze/lib/memcpy.c:178:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 178 | *dst++ = *src++; ../arch/microblaze/lib/memcpy.c:179:2: note: here 179 | case 2: ../arch/microblaze/lib/memcpy.c:180:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 180 | *dst++ = *src++; ../arch/microblaze/lib/memcpy.c:181:2: note: here 181 | case 1: CC arch/microblaze/lib/memset.o ../arch/microblaze/lib/memset.c: In function 'memset': ../arch/microblaze/lib/memset.c:71:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 71 | --n; ../arch/microblaze/lib/memset.c:72:3: note: here 72 | case 2: ../arch/microblaze/lib/memset.c:74:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 74 | --n; ../arch/microblaze/lib/memset.c:75:3: note: here 75 | case 3: CC arch/microblaze/lib/memmove.o ../arch/microblaze/lib/memmove.c: In function 'memmove': ../arch/microblaze/lib/memmove.c:92:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 92 | --c; ../arch/microblaze/lib/memmove.c:93:3: note: here 93 | case 2: ../arch/microblaze/lib/memmove.c:95:4: warning: this statement may fall through [-Wimplicit-fallthrough=] 95 | --c; ../arch/microblaze/lib/memmove.c:96:3: note: here 96 | case 1: ../arch/microblaze/lib/memmove.c:203:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 203 | *--dst = *--src; ../arch/microblaze/lib/memmove.c:204:2: note: here 204 | case 3: ../arch/microblaze/lib/memmove.c:205:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 205 | *--dst = *--src; ../arch/microblaze/lib/memmove.c:206:2: note: here 206 | case 2: ../arch/microblaze/lib/memmove.c:207:10: warning: this statement may fall through [-Wimplicit-fallthrough=] 207 | *--dst = *--src; ../arch/microblaze/lib/memmove.c:208:2: note: here 208 | case 1: Signed-off-by: Randy Dunlap <> Cc: Michal Simek <> Link: Signed-off-by: Michal Simek <>
2021-03-23tracing: Fix various typos in commentsIngo Molnar
Fix ~59 single-word typos in the tracing code comments, and fix the grammar in a handful of places. Link: Link: Reviewed-by: Randy Dunlap <> Signed-off-by: Ingo Molnar <> Signed-off-by: Steven Rostedt (VMware) <>
2021-03-23xsysace: Remove SYSACE driverMichal Simek
Sysace IP is no longer used on Xilinx PowerPC 405/440 and Microblaze systems. The driver is not regularly tested and very likely not working for quite a long time that's why remove it. Signed-off-by: Michal Simek <> Signed-off-by: Jens Axboe <>
2021-03-22microblaze: Fix a typoBhaskar Chowdhury
s/storign/storing/ Signed-off-by: Bhaskar Chowdhury <> Link: Acked-by: Randy Dunlap <> Signed-off-by: Michal Simek <>
2021-03-17quota: wire up quotactl_pathSascha Hauer
Wire up the quotactl_path syscall added in the previous patch. Link: Signed-off-by: Sascha Hauer <> Reviewed-by: Christoph Hellwig <> Signed-off-by: Jan Kara <>
2021-03-02microblaze: tag highmem_setup() with __meminitDavid Hildenbrand
With commit a0cd7a7c4bc0 ("mm: simplify free_highmem_page() and free_reserved_page()") the kernel test robot complains about a warning: WARNING: modpost: vmlinux.o(.text.unlikely+0x23ac): Section mismatch in reference from the function highmem_setup() to the function .meminit.text:memblock_is_reserved() This has been broken ever since microblaze added highmem support, because memblock_is_reserved() was already tagged with "__init" back then - most probably the function always got inlined, so we never stumbled over it. We need __meminit because __init_memblock defaults to that without CONFIG_ARCH_KEEP_MEMBLOCK" and __init_memblock is not used outside memblock code. Reported-by: kernel test robot <> Fixes: 2f2f371f8907 ("microblaze: Highmem support") Cc: Andrew Morton <> Cc: Michal Simek <> Cc: Mike Rapoport <> Cc: Andrew Morton <> Cc: Thomas Gleixner <> Cc: Arvind Sankar <> Cc: Ira Weiny <> Cc: Randy Dunlap <> Cc: Oscar Salvador <> Cc: Anshuman Khandual <> Signed-off-by: David Hildenbrand <> Reviewed-by: Oscar Salvador <> Link: Signed-off-by: Michal Simek <>
2021-03-02microblaze: syscalls: switch to generic syscallhdr.shMasahiro Yamada
Many architectures duplicate similar shell scripts. This commit converts microblaze to use scripts/ Signed-off-by: Masahiro Yamada <> Link: Signed-off-by: Michal Simek <>
2021-03-02microblaze: syscalls: switch to generic syscalltbl.shMasahiro Yamada
Many architectures duplicate similar shell scripts. This commit converts microblaze to use scripts/ Signed-off-by: Masahiro Yamada <> Link: Signed-off-by: Michal Simek <>
2021-02-27Merge tag 'io_uring-worker.v3-2021-02-25' of git:// Torvalds
Pull io_uring thread rewrite from Jens Axboe: "This converts the io-wq workers to be forked off the tasks in question instead of being kernel threads that assume various bits of the original task identity. This kills > 400 lines of code from io_uring/io-wq, and it's the worst part of the code. We've had several bugs in this area, and the worry is always that we could be missing some pieces for file types doing unusual things (recent /dev/tty example comes to mind, userfaultfd reads installing file descriptors is another fun one... - both of which need special handling, and I bet it's not the last weird oddity we'll find). With these identical workers, we can have full confidence that we're never missing anything. That, in itself, is a huge win. Outside of that, it's also more efficient since we're not wasting space and code on tracking state, or switching between different states. I'm sure we're going to find little things to patch up after this series, but testing has been pretty thorough, from the usual regression suite to production. Any issue that may crop up should be manageable. There's also a nice series of further reductions we can do on top of this, but I wanted to get the meat of it out sooner rather than later. The general worry here isn't that it's fundamentally broken. Most of the little issues we've found over the last week have been related to just changes in how thread startup/exit is done, since that's the main difference between using kthreads and these kinds of threads. In fact, if all goes according to plan, I want to get this into the 5.10 and 5.11 stable branches as well. That said, the changes outside of io_uring/io-wq are: - arch setup, simple one-liner to each arch copy_thread() implementation. - Removal of net and proc restrictions for io_uring, they are no longer needed or useful" * tag 'io_uring-worker.v3-2021-02-25' of git:// (30 commits) io-wq: remove now unused IO_WQ_BIT_ERROR io_uring: fix SQPOLL thread handling over exec io-wq: improve manager/worker handling over exec io_uring: ensure SQPOLL startup is triggered before error shutdown io-wq: make buffered file write hashed work map per-ctx io-wq: fix race around io_worker grabbing io-wq: fix races around manager/worker creation and task exit io_uring: ensure io-wq context is always destroyed for tasks arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() io_uring: cleanup ->user usage io-wq: remove nr_process accounting io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS net: remove cmsg restriction from io_uring based send/recvmsg calls Revert "proc: don't allow async path resolution of /proc/self components" Revert "proc: don't allow async path resolution of /proc/thread-self components" io_uring: move SQPOLL thread io-wq forked worker io-wq: make io_wq_fork_thread() available to other users io-wq: only remove worker from free_list, if it was there io_uring: remove io_identity io_uring: remove any grabbing of context ...
2021-02-25Merge tag 'kbuild-v5.12' of ↵Linus Torvalds
git:// Pull Kbuild updates from Masahiro Yamada: - Fix false-positive build warnings for ARCH=ia64 builds - Optimize dictionary size for module compression with xz - Check the compiler and linker versions in Kconfig - Fix misuse of extra-y - Support DWARF v5 debug info - Clamp SUBLEVEL to 255 because stable releases 4.4.x and 4.9.x exceeded the limit - Add generic syscall{tbl,hdr}.sh for cleanups across arches - Minor cleanups of genksyms - Minor cleanups of Kconfig * tag 'kbuild-v5.12' of git:// (38 commits) initramfs: Remove redundant dependency of RD_ZSTD on BLK_DEV_INITRD kbuild: remove deprecated 'always' and 'hostprogs-y/m' kbuild: parse C= and M= before changing the working directory kbuild: reuse this-makefile to define abs_srctree kconfig: unify rule of config, menuconfig, nconfig, gconfig, xconfig kconfig: omit --oldaskconfig option for 'make config' kconfig: fix 'invalid option' for help option kconfig: remove dead code in conf_askvalue() kconfig: clean up nested if-conditionals in check_conf() kconfig: Remove duplicate call to sym_get_string_value() Makefile: Remove # characters from compiler string Makefile: reuse CC_VERSION_TEXT kbuild: check the minimum linker version in Kconfig kbuild: remove ld-version macro scripts: add generic scripts: add generic arch: syscalls: remove $(srctree)/ prefix from syscall tables arch: syscalls: add missing FORCE and fix 'targets' to make if_changed work gen_compile_commands: prune some directories kbuild: simplify access to the kernel's version ...
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git:// Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: This comes with an extensive xfstests suite covering both ext4 and xfs: It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git:// (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-23Merge tag 'microblaze-v5.12' of git:// Torvalds
Pull microblaze updates from Michal Simek: - Fix DTB alignment - Remove code for very old GCC versions - Remove TRACING_SUPPORT selection * tag 'microblaze-v5.12' of git:// microblaze: Fix built-in DTB alignment to be 8-byte aligned microblaze: Remove support for gcc < 4 microblaze: do not select TRACING_SUPPORT directly
2021-02-21arch: setup PF_IO_WORKER threads like PF_KTHREADJens Axboe
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the sense that we don't assign ->set_child_tid with our own structure. Just ensure that every arch sets up the PF_IO_WORKER threads like kthreads in the arch implementation of copy_thread(). Signed-off-by: Jens Axboe <>
2021-02-22arch: syscalls: remove $(srctree)/ prefix from syscall tablesMasahiro Yamada
The 'syscall' variables are not directly used in the commands. Remove the $(srctree)/ prefix because we can rely on VPATH. Signed-off-by: Masahiro Yamada <>
2021-02-22arch: syscalls: add missing FORCE and fix 'targets' to make if_changed workMasahiro Yamada
The rules in these Makefiles cannot detect the command line change because the prerequisite 'FORCE' is missing. Adding 'FORCE' will result in the headers being rebuilt every time because the 'targets' additions are also wrong; the file paths in 'targets' must be relative to the current Makefile. Fix all of them so the if_changed rules work correctly. Signed-off-by: Masahiro Yamada <> Acked-by: Geert Uytterhoeven <>
2021-02-16microblaze: Fix built-in DTB alignment to be 8-byte alignedRob Herring
Commit 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") broke booting on Microblaze systems depending on the build. The problem is libfdt gained an 8-byte starting alignment check, but the Microblaze built-in DTB area is only 4-byte aligned. This affected not just built-in DTBs as bootloader passed DTBs are copied into the built-in DTB region. Other arches using built-in DTBs use a common linker macro which has sufficient alignment. Fixes: 79edff12060f ("scripts/dtc: Update to upstream version v1.6.0-51-g183df9e9c2b9") Reported-by: Guenter Roeck <> Tested-by: Guenter Roeck <> Cc: Michal Simek <> Signed-off-by: Rob Herring <> Link: Signed-off-by: Michal Simek <>
2021-02-11microblaze: Remove support for gcc < 4Geert Uytterhoeven
Since commit cafa0010cd51fb71 ("Raise the minimum required gcc version to 4.6") , the kernel can no longer be compiled using gcc-3. Hence drop support code for gcc-3. Signed-off-by: Geert Uytterhoeven <> Link: Signed-off-by: Michal Simek <>
2021-01-24fs: add mount_setattr()Christian Brauner
This implements the missing mount_setattr() syscall. While the new mount api allows to change the properties of a superblock there is currently no way to change the properties of a mount or a mount tree using file descriptors which the new mount api is based on. In addition the old mount api has the restriction that mount options cannot be applied recursively. This hasn't changed since changing mount options on a per-mount basis was implemented in [1] and has been a frequent request not just for convenience but also for security reasons. The legacy mount syscall is unable to accommodate this behavior without introducing a whole new set of flags because MS_REC | MS_REMOUNT | MS_BIND | MS_RDONLY | MS_NOEXEC | [...] only apply the mount option to the topmost mount. Changing MS_REC to apply to the whole mount tree would mean introducing a significant uapi change and would likely cause significant regressions. The new mount_setattr() syscall allows to recursively clear and set mount options in one shot. Multiple calls to change mount options requesting the same changes are idempotent: int mount_setattr(int dfd, const char *path, unsigned flags, struct mount_attr *uattr, size_t usize); Flags to modify path resolution behavior are specified in the @flags argument. Currently, AT_EMPTY_PATH, AT_RECURSIVE, AT_SYMLINK_NOFOLLOW, and AT_NO_AUTOMOUNT are supported. If useful, additional lookup flags to restrict path resolution as introduced with openat2() might be supported in the future. The mount_setattr() syscall can be expected to grow over time and is designed with extensibility in mind. It follows the extensible syscall pattern we have used with other syscalls such as openat2(), clone3(), sched_{set,get}attr(), and others. The set of mount options is passed in the uapi struct mount_attr which currently has the following layout: struct mount_attr { __u64 attr_set; __u64 attr_clr; __u64 propagation; __u64 userns_fd; }; The @attr_set and @attr_clr members are used to clear and set mount options. This way a user can e.g. request that a set of flags is to be raised such as turning mounts readonly by raising MOUNT_ATTR_RDONLY in @attr_set while at the same time requesting that another set of flags is to be lowered such as removing noexec from a mount tree by specifying MOUNT_ATTR_NOEXEC in @attr_clr. Note, since the MOUNT_ATTR_<atime> values are an enum starting from 0, not a bitmap, users wanting to transition to a different atime setting cannot simply specify the atime setting in @attr_set, but must also specify MOUNT_ATTR__ATIME in the @attr_clr field. So we ensure that MOUNT_ATTR__ATIME can't be partially set in @attr_clr and that @attr_set can't have any atime bits set if MOUNT_ATTR__ATIME isn't set in @attr_clr. The @propagation field lets callers specify the propagation type of a mount tree. Propagation is a single property that has four different settings and as such is not really a flag argument but an enum. Specifically, it would be unclear what setting and clearing propagation settings in combination would amount to. The legacy mount() syscall thus forbids the combination of multiple propagation settings too. The goal is to keep the semantics of mount propagation somewhat simple as they are overly complex as it is. The @userns_fd field lets user specify a user namespace whose idmapping becomes the idmapping of the mount. This is implemented and explained in detail in the next patch. [1]: commit 2e4b7fcd9260 ("[PATCH] r/o bind mounts: honor mount writer counts at remount") Link: Cc: David Howells <> Cc: Aleksa Sarai <> Cc: Al Viro <> Cc: Cc: Reviewed-by: Christoph Hellwig <> Signed-off-by: Christian Brauner <>
2021-01-22arch: microblaze: Remove CONFIG_OPROFILE supportViresh Kumar
The "oprofile" user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. Remove the old oprofile's architecture specific support. Suggested-by: Christoph Hellwig <> Suggested-by: Linus Torvalds <> Signed-off-by: Viresh Kumar <> Acked-by: Robert Richter <> Acked-by: Michal Simek <> Acked-by: William Cohen <> Acked-by: Al Viro <> Acked-by: Thomas Gleixner <>
2021-01-04microblaze: do not select TRACING_SUPPORT directlyMasahiro Yamada
Microblaze is the only architecture that selects TRACING_SUPPORT. In my understanding, it is computed by kernel/trace/Kconfig: config TRACING_SUPPORT bool depends on TRACE_IRQFLAGS_SUPPORT depends on STACKTRACE_SUPPORT default y Microblaze enables both TRACE_IRQFLAGS_SUPPORT and STACKTRACE_SUPPORT, so there is no change in the resulted configuration. Signed-off-by: Masahiro Yamada <> Link: Signed-off-by: Michal Simek <>
2020-12-29local64.h: make <asm/local64.h> mandatoryRandy Dunlap
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and remove all arch/*/include/asm/local64.h arch-specific files since they only #include <asm-generic/local64.h>. This fixes build errors on arch/c6x/ and arch/nios2/ for block/blk-iocost.c. Build-tested on 21 of 25 arch-es. (tools problems on the others) Yes, we could even rename <asm-generic/local64.h> to <linux/local64.h> and change all #includes to use <linux/local64.h> instead. Link: Signed-off-by: Randy Dunlap <> Suggested-by: Christoph Hellwig <> Reviewed-by: Masahiro Yamada <> Cc: Jens Axboe <> Cc: Ley Foon Tan <> Cc: Mark Salter <> Cc: Aurelien Jacquiot <> Cc: Peter Zijlstra <> Cc: Arnd Bergmann <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2020-12-19epoll: wire up syscall epoll_pwait2Willem de Bruijn
Split off from prev patch in the series that implements the syscall. Link: Signed-off-by: Willem de Bruijn <> Cc: Al Viro <> Cc: Arnd Bergmann <> Cc: Matthew Wilcox (Oracle) <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2020-12-16Merge tag 'microblaze-v5.11' of git:// Torvalds
Pull microblaze updates from Michal Simek: "The biggest change is to remove support for noMMU configuration. FPGAs are bigger so people use Microblaze with MMU for a lot of years and there is likely no user of this code anymore. No one is updating libraries for this configuration either. - Remove noMMU support - Add support for TIF_NOTIFY_SIGNAL - Small header fix" * tag 'microblaze-v5.11' of git:// microblaze: Remove noMMU code microblaze: add support for TIF_NOTIFY_SIGNAL microblaze: Replace <linux/clk-provider.h> by <linux/of_clk.h>
2020-12-16Merge tag 'asm-generic-timers-5.11' of ↵Linus Torvalds
git:// Pull asm-generic cross-architecture timer cleanup from Arnd Bergmann: "This cleans up two ancient timer features that were never completed in the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET. There was only one user left for the ARCH_USES_GETTIMEOFFSET variant of clocksource implementations, the ARM EBSA110 platform. Rather than changing to use modern timekeeping, we remove the platform entirely as Russell no longer uses his machine and nobody else seems to have one any more. The conditional code for using arch_gettimeoffset() is removed as a result. For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms not using clockevent drivers: parisc, ia64, most of m68k, and one Arm platform. These all do timer ticks slighly differently, and this gets cleaned up to the point they at least all call the same helper function. Instead of most platforms using 'select GENERIC_CLOCKEVENTS' in Kconfig, the polarity is now reversed, with the few remaining ones selecting LEGACY_TIMER_TICK instead" * tag 'asm-generic-timers-5.11' of git:// timekeeping: default GENERIC_CLOCKEVENTS to enabled timekeeping: remove xtime_update m68k: remove timer_interrupt() function m68k: change remaining timers to legacy_timer_tick m68k: m68328: use legacy_timer_tick() m68k: sun3/sun3c: use legacy_timer_tick m68k: split heartbeat out of timer function m68k: coldfire: use legacy_timer_tick() parisc: use legacy_timer_tick ARM: rpc: use legacy_timer_tick ia64: convert to legacy_timer_tick timekeeping: add CONFIG_LEGACY_TIMER_TICK timekeeping: remove arch_gettimeoffset net: remove am79c961a driver ARM: remove ebsa110 platform
2020-12-15Merge tag 'asm-generic-mmu-context-5.11' of ↵Linus Torvalds
git:// Pull asm-generic mmu-context cleanup from Arnd Bergmann: "This is a cleanup series from Nicholas Piggin, preparing for later changes. The asm/mmu_context.h header are generalized and common code moved to asm-gneneric/mmu_context.h. This saves a bit of code and makes it easier to change in the future" * tag 'asm-generic-mmu-context-5.11' of git:// (25 commits) h8300: Fix generic mmu_context build m68k: mmu_context: Fix Sun-3 build xtensa: use asm-generic/mmu_context.h for no-op implementations x86: use asm-generic/mmu_context.h for no-op implementations um: use asm-generic/mmu_context.h for no-op implementations sparc: use asm-generic/mmu_context.h for no-op implementations sh: use asm-generic/mmu_context.h for no-op implementations s390: use asm-generic/mmu_context.h for no-op implementations riscv: use asm-generic/mmu_context.h for no-op implementations powerpc: use asm-generic/mmu_context.h for no-op implementations parisc: use asm-generic/mmu_context.h for no-op implementations openrisc: use asm-generic/mmu_context.h for no-op implementations nios2: use asm-generic/mmu_context.h for no-op implementations nds32: use asm-generic/mmu_context.h for no-op implementations mips: use asm-generic/mmu_context.h for no-op implementations microblaze: use asm-generic/mmu_context.h for no-op implementations m68k: use asm-generic/mmu_context.h for no-op implementations ia64: use asm-generic/mmu_context.h for no-op implementations hexagon: use asm-generic/mmu_context.h for no-op implementations csky: use asm-generic/mmu_context.h for no-op implementations ...
2020-12-14Merge tag 'core-mm-2020-12-14' of ↵Linus Torvalds
git:// Pull kmap updates from Thomas Gleixner: "The new preemtible kmap_local() implementation: - Consolidate all kmap_atomic() internals into a generic implementation which builds the base for the kmap_local() API and make the kmap_atomic() interface wrappers which handle the disabling/enabling of preemption and pagefaults. - Switch the storage from per-CPU to per task and provide scheduler support for clearing mapping when scheduling out and restoring them when scheduling back in. - Merge the migrate_disable/enable() code, which is also part of the scheduler pull request. This was required to make the kmap_local() interface available which does not disable preemption when a mapping is established. It has to disable migration instead to guarantee that the virtual address of the mapped slot is the same across preemption. - Provide better debug facilities: guard pages and enforced utilization of the mapping mechanics on 64bit systems when the architecture allows it. - Provide the new kmap_local() API which can now be used to cleanup the kmap_atomic() usage sites all over the place. Most of the usage sites do not require the implicit disabling of preemption and pagefaults so the penalty on 64bit and 32bit non-highmem systems is removed and quite some of the code can be simplified. A wholesale conversion is not possible because some usage depends on the implicit side effects and some need to be cleaned up because they work around these side effects. The migrate disable side effect is only effective on highmem systems and when enforced debugging is enabled. On 64bit and 32bit non-highmem systems the overhead is completely avoided" * tag 'core-mm-2020-12-14' of git:// (33 commits) ARM: highmem: Fix cache_is_vivt() reference x86/crashdump/32: Simplify copy_oldmem_page() io-mapping: Provide iomap_local variant mm/highmem: Provide kmap_local* sched: highmem: Store local kmaps in task struct x86: Support kmap_local() forced debugging mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm/highmem: Provide and use CONFIG_DEBUG_KMAP_LOCAL microblaze/mm/highmem: Add dropped #ifdef back xtensa/mm/highmem: Make generic kmap_atomic() work correctly mm/highmem: Take kmap_high_get() properly into account highmem: High implementation details and document API Documentation/io-mapping: Remove outdated blurb io-mapping: Cleanup atomic iomap mm/highmem: Remove the old kmap_atomic cruft highmem: Get rid of kmap_types.h xtensa/mm/highmem: Switch to generic kmap atomic sparc/mm/highmem: Switch to generic kmap atomic powerpc/mm/highmem: Switch to generic kmap atomic nds32/mm/highmem: Switch to generic kmap atomic ...
2020-11-26microblaze: Remove noMMU codeMichal Simek
This configuration is obsolete and likely none is really using it. That's why remove it to simplify code. Note about CONFIG_MMU in hw_exception_handler.S is left intentionally for better comment understanding. Cc: Mike Rapoport <> Cc: Arnd Bergmann <> Signed-off-by: Michal Simek <> Acked-by: Mike Rapoport <> Acked-by: Arnd Bergmann <> Link:
2020-11-24sched/idle: Fix arch_cpu_idle() vs tracingPeter Zijlstra
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: Sven Schnelle <> Signed-off-by: Peter Zijlstra (Intel) <> Reviewed-by: Mark Rutland <> Tested-by: Mark Rutland <> Link:
2020-11-19microblaze/mm/highmem: Add dropped #ifdef backThomas Gleixner
The conversion to generic kmap atomic broke microblaze by removing the build fail. Add it back. Fixes: 7ac1b26b0a72 ("microblaze/mm/highmem: Switch to generic kmap atomic") Reported-by: Randy Dunlap <> Signed-off-by: Thomas Gleixner <> Cc: Michal Simek <>
2020-11-10microblaze: add support for TIF_NOTIFY_SIGNALJens Axboe
Wire up TIF_NOTIFY_SIGNAL handling for microblaze. Cc: Michal Simek <> Signed-off-by: Jens Axboe <> Link: Signed-off-by: Michal Simek <>
2020-11-10microblaze: Replace <linux/clk-provider.h> by <linux/of_clk.h>Geert Uytterhoeven
The MicroBlaze platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <> Reviewed-by: Stephen Boyd <> Link: Signed-off-by: Michal Simek <>
2020-11-06microblaze/mm/highmem: Switch to generic kmap atomicThomas Gleixner
No reason having the same code in every architecture. Signed-off-by: Thomas Gleixner <> Cc: Michal Simek <> Cc: Arnd Bergmann <> Link:
2020-10-30timekeeping: default GENERIC_CLOCKEVENTS to enabledArnd Bergmann
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to require each one to select that symbol manually. Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as a simplification. It should be possible to select both GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now and decide at runtime between the two. For the clockevents arch-support.txt file, this means that additional architectures are marked as TODO when they have at least one machine that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when at least one machine has been converted. This means that both m68k and arm (for riscpc) revert to TODO. At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS rather than leaving it off when not needed. I built an m68k defconfig kernel (using gcc-10.1.0) and found that this would add around 5.5KB in kernel image size: text data bss dec hex filename 3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent 3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent On Arm (MACH_RPC), that difference appears to be twice as large, around 11KB on top of an 6MB vmlinux. Reviewed-by: Geert Uytterhoeven <> Acked-by: Geert Uytterhoeven <> Tested-by: Geert Uytterhoeven <> Reviewed-by: Linus Walleij <> Signed-off-by: Arnd Bergmann <>
2020-10-27microblaze: use asm-generic/mmu_context.h for no-op implementationsNicholas Piggin
Wire asm-generic/mmu_context.h to provide generic empty hooks for arch code simplification. Signed-off-by: Nicholas Piggin <> Acked-by: Michal Simek <> Signed-off-by: Arnd Bergmann <>
2020-10-26asm-generic: add generic MMU versions of mmu context functionsNicholas Piggin
Many of these are no-ops on many architectures, so extend mmu_context.h to cover MMU and NOMMU, and split the NOMMU bits out to nommu_context.h Signed-off-by: Nicholas Piggin <> Acked-by: Mike Rapoport <> Acked-by: Arnd Bergmann <> Cc: Signed-off-by: Arnd Bergmann <>
2020-10-25treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: Signed-off-by: Joe Perches <> Reviewed-by: Nick Desaulniers <> Reviewed-by: Miguel Ojeda <> Signed-off-by: Linus Torvalds <>
2020-10-23Merge tag 'arch-cleanup-2020-10-22' of git:// Torvalds
Pull arch task_work cleanups from Jens Axboe: "Two cleanups that don't fit other categories: - Finally get the task_work_add() cleanup done properly, so we don't have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates all callers, and also fixes up the documentation for task_work_add(). - While working on some TIF related changes for 5.11, this TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch duplication for how that is handled" * tag 'arch-cleanup-2020-10-22' of git:// task_work: cleanup notification modes tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
2020-10-22Merge branch 'work.set_fs' of ↵Linus Torvalds
git:// Pull initial set_fs() removal from Al Viro: "Christoph's set_fs base series + fixups" * 'work.set_fs' of git:// fs: Allow a NULL pos pointer to __kernel_read fs: Allow a NULL pos pointer to __kernel_write powerpc: remove address space overrides using set_fs() powerpc: use non-set_fs based maccess routines x86: remove address space overrides using set_fs() x86: make TASK_SIZE_MAX usable from assembly code x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h lkdtm: remove set_fs-based tests test_bitmap: remove user bitmap tests uaccess: add infrastructure for kernel builds with set_fs() fs: don't allow splice read/write without explicit ops fs: don't allow kernel reads and writes without iter ops sysctl: Convert to iter interfaces proc: add a read_iter method to proc proc_ops proc: cleanup the compat vs no compat file ops proc: remove a level of indentation in proc_get_inode
2020-10-18mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1]" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - [ fix process_madvise build break for arm64] Link: [ fix build error for mips of process_madvise] Link: [ fix patch ordering issue] [ fix arm64 whoops] [ make process_madvise() vlen arg have type size_t, per Florian] [ fix i386 build] [ fix syscall numbering] Link: [ madvise.c needs compat.h] Link: [ fix mips build] Link: [ remove duplicate header which is included twice] Link: [ do not use helper functions for process_madvise] Link: [ pidfd_get_pid() gained an argument] [ fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: Signed-off-by: Minchan Kim <> Signed-off-by: YueHaibing <> Signed-off-by: Stephen Rothwell <> Signed-off-by: Andrew Morton <> Reviewed-by: Suren Baghdasaryan <> Reviewed-by: Vlastimil Babka <> Acked-by: David Rientjes <> Cc: Alexander Duyck <> Cc: Brian Geffon <> Cc: Christian Brauner <> Cc: Daniel Colascione <> Cc: Jann Horn <> Cc: Jens Axboe <> Cc: Joel Fernandes <> Cc: Johannes Weiner <> Cc: John Dias <> Cc: Kirill Tkhai <> Cc: Michal Hocko <> Cc: Oleksandr Natalenko <> Cc: Sandeep Patil <> Cc: SeongJae Park <> Cc: SeongJae Park <> Cc: Shakeel Butt <> Cc: Sonny Rao <> Cc: Tim Murray <> Cc: Christian Brauner <> Cc: Florian Weimer <> Cc: <> Link: Link: Link: Link: Signed-off-by: Linus Torvalds <>
2020-10-17tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()Jens Axboe
All the callers currently do this, clean it up and move the clearing into tracehook_notify_resume() instead. Reviewed-by: Oleg Nesterov <> Reviewed-by: Thomas Gleixner <> Signed-off-by: Jens Axboe <>
2020-10-15Merge tag 'dma-mapping-5.10' of git:// Torvalds
Pull dma-mapping updates from Christoph Hellwig: - rework the non-coherent DMA allocator - move private definitions out of <linux/dma-mapping.h> - lower CMA_ALIGNMENT (Paul Cercueil) - remove the omap1 dma address translation in favor of the common code - make dma-direct aware of multiple dma offset ranges (Jim Quinlan) - support per-node DMA CMA areas (Barry Song) - increase the default seg boundary limit (Nicolin Chen) - misc fixes (Robin Murphy, Thomas Tai, Xu Wang) - various cleanups * tag 'dma-mapping-5.10' of git:// (63 commits) ARM/ixp4xx: add a missing include of dma-map-ops.h dma-direct: simplify the DMA_ATTR_NO_KERNEL_MAPPING handling dma-direct: factor out a dma_direct_alloc_from_pool helper dma-direct check for highmem pages in dma_direct_alloc_pages dma-mapping: merge <linux/dma-noncoherent.h> into <linux/dma-map-ops.h> dma-mapping: move large parts of <linux/dma-direct.h> to kernel/dma dma-mapping: move dma-debug.h to kernel/dma/ dma-mapping: remove <asm/dma-contiguous.h> dma-mapping: merge <linux/dma-contiguous.h> into <linux/dma-map-ops.h> dma-contiguous: remove dma_contiguous_set_default dma-contiguous: remove dev_set_cma_area dma-contiguous: remove dma_declare_contiguous dma-mapping: split <linux/dma-mapping.h> cma: decrease CMA_ALIGNMENT lower limit to 2 firewire-ohci: use dma_alloc_pages dma-iommu: implement ->alloc_noncoherent dma-mapping: add new {alloc,free}_noncoherent dma_map_ops methods dma-mapping: add a new dma_alloc_pages API dma-mapping: remove dma_cache_sync 53c700: convert to dma_alloc_noncoherent ...
2020-10-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "181 patches. Subsystems affected by this patch series: kbuild, scripts, ntfs, ocfs2, vfs, mm (slab, slub, kmemleak, dax, debug, pagecache, fadvise, gup, swap, memremap, memcg, selftests, pagemap, mincore, hmm, dma, memory-failure, vmallo and migration)" * emailed patches from Andrew Morton <>: (181 commits) mm/migrate: remove obsolete comment about device public mm/migrate: remove cpages-- in migrate_vma_finalize() mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary memblock: use separate iterators for memory and reserved regions memblock: implement for_each_reserved_mem_region() using __next_mem_region() memblock: remove unused memblock_mem_size() x86/setup: simplify reserve_crashkernel() x86/setup: simplify initrd relocation and reservation arch, drivers: replace for_each_membock() with for_each_mem_range() arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() memblock: reduce number of parameters in for_each_mem_range() memblock: make memblock_debug and related functionality private memblock: make for_each_memblock_type() iterator private mircoblaze: drop unneeded NUMA and sparsemem initializations riscv: drop unneeded node initialization h8300, nds32, openrisc: simplify detection of memory extents arm64: numa: simplify dummy_numa_init() arm, xtensa: simplify initialization of high memory pages dma-contiguous: simplify cma_early_percent_memory() KVM: PPC: Book3S HV: simplify kvm_cma_reserve() ...
2020-10-13arch, drivers: replace for_each_membock() with for_each_mem_range()Mike Rapoport
There are several occurrences of the following pattern: for_each_memblock(memory, reg) { start = __pfn_to_phys(memblock_region_memory_base_pfn(reg); end = __pfn_to_phys(memblock_region_memory_end_pfn(reg)); /* do something with start and end */ } Using for_each_mem_range() iterator is more appropriate in such cases and allows simpler and cleaner code. [ fix arch/arm/mm/pmsa-v7.c build] [ mips: fix cavium-octeon build caused by memblock refactoring] Link: Signed-off-by: Mike Rapoport <> Signed-off-by: Andrew Morton <> Cc: Andy Lutomirski <> Cc: Baoquan He <> Cc: Benjamin Herrenschmidt <> Cc: Borislav Petkov <> Cc: Catalin Marinas <> Cc: Christoph Hellwig <> Cc: Daniel Axtens <> Cc: Dave Hansen <> Cc: Emil Renner Berthing <> Cc: Hari Bathini <> Cc: Ingo Molnar <> Cc: Ingo Molnar <> Cc: Jonathan Cameron <> Cc: Marek Szyprowski <> Cc: Max Filippov <> Cc: Michael Ellerman <> Cc: Michal Simek <> Cc: Miguel Ojeda <> Cc: Palmer Dabbelt <> Cc: Paul Mackerras <> Cc: Paul Walmsley <> Cc: Peter Zijlstra <> Cc: Russell King <> Cc: Stafford Horne <> Cc: Thomas Bogendoerfer <> Cc: Thomas Gleixner <> Cc: Will Deacon <> Cc: Yoshinori Sato <> Link: Signed-off-by: Linus Torvalds <>