git.armlinux.org.uk/linux.git - Linus' kernel tree

diff options

author	Christian Brauner <brauner@kernel.org>	2025-09-02 11:37:34 +0200
committer	Christian Brauner <brauner@kernel.org>	2025-09-02 11:40:35 +0200
commit	46582a15c1742ff0dd9d2faffd62672a79603a42 (patch)
tree	d4f36a7757be29fcc2537cb1b4b8d7c862aaeee0 /scripts/gdb/linux/mm.py
parent	998541db0ed257ab0682e4a392d8ced5f2d5ff6b (diff)
parent	5554d820f71c72fbe64e12c3d171908c5ef7257d (diff)

Merge patch series "procfs: make reference pidns more user-visible"

Aleksa Sarai <cyphar@cyphar.com> says: Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed). /* pidns mount option for procfs */ This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs). But this requires forking off a helper process because you cannot just one-shot this using mount(2). * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process). While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate. Things would be much easier if there was a way for userspace to just specify the pidns they want. Patch 1 implements a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); The initial security model I have in this RFC is to be as conservative as possible and just mirror the security model for setns(2) -- which means that you can only set pidns=... to pid namespaces that your current pid namespace is a direct ancestor of and you have CAP_SYS_ADMIN privileges over the pid namespace. This fulfils the requirements of container runtimes, but I suspect that this may be too strict for some usecases. The pidns argument is not displayed in mountinfo -- it's not clear to me what value it would make sense to show (maybe we could just use ns_dname to provide an identifier for the namespace, but this number would be fairly useless to userspace). I'm open to suggestions. Note that PROCFS_GET_PID_NAMESPACE (see below) does at least let userspace get information about this outside of mountinfo. Note that you cannot change the pidns of an already-created procfs instance. The primary reason is that allowing this to be changed would require RCU-protecting proc_pid_ns(sb) and thus auditing all of fs/proc/* and some of the users in fs/* to make sure they wouldn't UAF the pid namespace. Since creating procfs instances is very cheap, it seems unnecessary to overcomplicate this upfront. Trying to reconfigure procfs this way errors out with -EBUSY. * patches from https://lore.kernel.org/20250805-procfs-pidns-api-v4-0-705f984940e7@cyphar.com: selftests/proc: add tests for new pidns APIs procfs: add "pidns" mount option pidns: move is-ancestor logic to helper Link: https://lore.kernel.org/20250805-procfs-pidns-api-v4-0-705f984940e7@cyphar.com Signed-off-by: Christian Brauner <brauner@kernel.org>

Diffstat (limited to 'scripts/gdb/linux/mm.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: