path: root/security/smack
AgeCommit message (Collapse)Author
2021-05-01Merge tag 'landlock_v34' of ↵Linus Torvalds
git:// Pull Landlock LSM from James Morris: "Add Landlock, a new LSM from Mickaël Salaün. Briefly, Landlock provides for unprivileged application sandboxing. From Mickaël's cover letter: "The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem access) for a set of processes. Because Landlock is a stackable LSM [1], it makes possible to create safe security sandboxes as new security layers in addition to the existing system-wide access-controls. This kind of sandbox is expected to help mitigate the security impact of bugs or unexpected/malicious behaviors in user-space applications. Landlock empowers any process, including unprivileged ones, to securely restrict themselves. Landlock is inspired by seccomp-bpf but instead of filtering syscalls and their raw arguments, a Landlock rule can restrict the use of kernel objects like file hierarchies, according to the kernel semantic. Landlock also takes inspiration from other OS sandbox mechanisms: XNU Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil. In this current form, Landlock misses some access-control features. This enables to minimize this patch series and ease review. This series still addresses multiple use cases, especially with the combined use of seccomp-bpf: applications with built-in sandboxing, init systems, security sandbox tools and security-oriented APIs [2]" The cover letter and v34 posting is here: See also: This code has had extensive design discussion and review over several years" Link: [1] Link: [2] * tag 'landlock_v34' of git:// landlock: Enable user space to infer supported features landlock: Add user and kernel documentation samples/landlock: Add a sandbox manager example selftests/landlock: Add user space tests landlock: Add syscall implementations arch: Wire up Landlock syscalls fs,security: Add sb_delete hook landlock: Support filesystem access-control LSM: Infrastructure management of the superblock landlock: Add ptrace restrictions landlock: Set up the security framework and manage credentials landlock: Add ruleset and domain management landlock: Add object management
2021-04-22LSM: Infrastructure management of the superblockCasey Schaufler
Move management of the superblock->sb_security blob out of the individual security modules and into the security infrastructure. Instead of allocating the blobs from within the modules, the modules tell the infrastructure how much space is required, and the space is allocated there. Cc: John Johansen <> Signed-off-by: Casey Schaufler <> Signed-off-by: Mickaël Salaün <> Reviewed-by: Stephen Smalley <> Acked-by: Serge Hallyn <> Reviewed-by: Kees Cook <> Link: Signed-off-by: James Morris <>
2021-03-22smack: differentiate between subjective and objective task credentialsPaul Moore
With the split of the security_task_getsecid() into subjective and objective variants it's time to update Smack to ensure it is using the correct task creds. Acked-by: Casey Schaufler <> Reviewed-by: Richard Guy Briggs <> Reviewed-by: John Johansen <> Signed-off-by: Paul Moore <>
2021-03-22lsm: separate security_task_getsecid() into subjective and objective variantsPaul Moore
Of the three LSMs that implement the security_task_getsecid() LSM hook, all three LSMs provide the task's objective security credentials. This turns out to be unfortunate as most of the hook's callers seem to expect the task's subjective credentials, although a small handful of callers do correctly expect the objective credentials. This patch is the first step towards fixing the problem: it splits the existing security_task_getsecid() hook into two variants, one for the subjective creds, one for the objective creds. void security_task_getsecid_subj(struct task_struct *p, u32 *secid); void security_task_getsecid_obj(struct task_struct *p, u32 *secid); While this patch does fix all of the callers to use the correct variant, in order to keep this patch focused on the callers and to ease review, the LSMs continue to use the same implementation for both hooks. The net effect is that this patch should not change the behavior of the kernel in any way, it will be up to the latter LSM specific patches in this series to change the hook implementations and return the correct credentials. Acked-by: Mimi Zohar <> (IMA) Acked-by: Casey Schaufler <> Reviewed-by: Richard Guy Briggs <> Signed-off-by: Paul Moore <>
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git:// Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: This comes with an extensive xfstests suite covering both ext4 and xfs: It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git:// (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-02smackfs: restrict bytes count in smackfs write functionsSabyrzhan Tasbolatov
syzbot found WARNINGs in several smackfs write operations where bytes count is passed to memdup_user_nul which exceeds GFP MAX_ORDER. Check count size if bigger than PAGE_SIZE. Per smackfs doc, smk_write_net4addr accepts any label or -CIPSO, smk_write_net6addr accepts any label or -DELETE. I couldn't find any general rule for other label lengths except SMK_LABELLEN, SMK_LONGLABEL, SMK_CIPSOMAX which are documented. Let's constrain, in general, smackfs label lengths for PAGE_SIZE. Although fuzzer crashes write to smackfs/netlabel on 0x400000 length. Here is a quick way to reproduce the WARNING: python -c "print('A' * 0x400000)" > /sys/fs/smackfs/netlabel Reported-by: Signed-off-by: Sabyrzhan Tasbolatov <> Signed-off-by: Casey Schaufler <>
2021-01-24commoncap: handle idmapped mountsChristian Brauner
When interacting with user namespace and non-user namespace aware filesystem capabilities the vfs will perform various security checks to determine whether or not the filesystem capabilities can be used by the caller, whether they need to be removed and so on. The main infrastructure for this resides in the capability codepaths but they are called through the LSM security infrastructure even though they are not technically an LSM or optional. This extends the existing security hooks security_inode_removexattr(), security_inode_killpriv(), security_inode_getsecurity() to pass down the mount's user namespace and makes them aware of idmapped mounts. In order to actually get filesystem capabilities from disk the capability infrastructure exposes the get_vfs_caps_from_disk() helper. For user namespace aware filesystem capabilities a root uid is stored alongside the capabilities. In order to determine whether the caller can make use of the filesystem capability or whether it needs to be ignored it is translated according to the superblock's user namespace. If it can be translated to uid 0 according to that id mapping the caller can use the filesystem capabilities stored on disk. If we are accessing the inode that holds the filesystem capabilities through an idmapped mount we map the root uid according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts: reading filesystem caps from disk enforces that the root uid associated with the filesystem capability must have a mapping in the superblock's user namespace and that the caller is either in the same user namespace or is a descendant of the superblock's user namespace. For filesystems that are mountable inside user namespace the caller can just mount the filesystem and won't usually need to idmap it. If they do want to idmap it they can create an idmapped mount and mark it with a user namespace they created and which is thus a descendant of s_user_ns. For filesystems that are not mountable inside user namespaces the descendant rule is trivially true because the s_user_ns will be the initial user namespace. If the initial user namespace is passed nothing changes so non-idmapped mounts will see identical behavior as before. Link: Cc: Christoph Hellwig <> Cc: David Howells <> Cc: Al Viro <> Cc: Reviewed-by: Christoph Hellwig <> Acked-by: James Morris <> Signed-off-by: Christian Brauner <>
2021-01-24xattr: handle idmapped mountsTycho Andersen
When interacting with extended attributes the vfs verifies that the caller is privileged over the inode with which the extended attribute is associated. For posix access and posix default extended attributes a uid or gid can be stored on-disk. Let the functions handle posix extended attributes on idmapped mounts. If the inode is accessed through an idmapped mount we need to map it according to the mount's user namespace. Afterwards the checks are identical to non-idmapped mounts. This has no effect for e.g. security xattrs since they don't store uids or gids and don't perform permission checks on them like posix acls do. Link: Cc: Christoph Hellwig <> Cc: David Howells <> Cc: Al Viro <> Cc: Reviewed-by: Christoph Hellwig <> Reviewed-by: James Morris <> Signed-off-by: Tycho Andersen <> Signed-off-by: Christian Brauner <>
2020-12-24Merge tag 'Smack-for-5.11-io_uring-fix' of ↵Linus Torvalds
git:// Pull smack fix from Casey Schaufler: "Provide a fix for the incorrect handling of privilege in the face of io_uring's use of kernel threads. That invalidated an long standing assumption regarding the privilege of kernel threads. The fix is simple and safe. It was provided by Jens Axboe and has been tested" * tag 'Smack-for-5.11-io_uring-fix' of git:// Smack: Handle io_uring kernel thread privileges
2020-12-22Smack: Handle io_uring kernel thread privilegesCasey Schaufler
Smack assumes that kernel threads are privileged for smackfs operations. This was necessary because the credential of the kernel thread was not related to a user operation. With io_uring the credential does reflect a user's rights and can be used. Suggested-by: Jens Axboe <> Acked-by: Jens Axboe <> Acked-by: Eric W. Biederman <> Signed-off-by: Casey Schaufler <>
2020-12-16Merge tag 'Smack-for-5.11' of git:// Torvalds
Pull smack updates from Casey Schaufler: "There are no functional changes. Just one minor code clean-up and a set of corrections in function header comments" * tag 'Smack-for-5.11' of git:// security/smack: remove unused varible 'rc' Smack: fix kernel-doc interface on functions
2020-12-03security: add const qualifier to struct sock in various placesFlorian Westphal
A followup change to tcp_request_sock_op would have to drop the 'const' qualifier from the 'route_req' function as the 'security_inet_conn_request' call is moved there - and that function expects a 'struct sock *'. However, it turns out its also possible to add a const qualifier to security_inet_conn_request instead. Signed-off-by: Florian Westphal <> Acked-by: James Morris <> Signed-off-by: Jakub Kicinski <>
2020-11-16security/smack: remove unused varible 'rc'Alex Shi
This varible isn't used and can be removed to avoid a gcc warning: security/smack/smack_lsm.c:3873:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] Signed-off-by: Alex Shi <> Cc: Casey Schaufler <> Cc: James Morris <> Cc: "Serge E. Hallyn" <> Cc: Cc: Signed-off-by: Casey Schaufler <>
2020-11-13Smack: fix kernel-doc interface on functionsAlex Shi
The are some kernel-doc interface issues: security/smack/smackfs.c:1950: warning: Function parameter or member 'list' not described in 'smk_parse_label_list' security/smack/smackfs.c:1950: warning: Excess function parameter 'private' description in 'smk_parse_label_list' security/smack/smackfs.c:1979: warning: Function parameter or member 'list' not described in 'smk_destroy_label_list' security/smack/smackfs.c:1979: warning: Excess function parameter 'head' description in 'smk_destroy_label_list' security/smack/smackfs.c:2141: warning: Function parameter or member 'count' not described in 'smk_read_logging' security/smack/smackfs.c:2141: warning: Excess function parameter 'cn' description in 'smk_read_logging' security/smack/smackfs.c:2278: warning: Function parameter or member 'format' not described in 'smk_user_access' Correct them in this patch. Signed-off-by: Alex Shi <> Cc: Casey Schaufler <> Cc: James Morris <> Cc: "Serge E. Hallyn" <> Cc: Cc: Signed-off-by: Casey Schaufler <>
2020-10-13Merge tag 'Smack-for-5.10' of git:// Torvalds
Pull smack updates from Casey Schaufler: "Two minor fixes and one performance enhancement to Smack. The performance improvement is significant and the new code is more like its counterpart in SELinux. - Two kernel test robot suggested clean-ups. - Teach Smack to use the IPv4 netlabel cache. This results in a 12-14% improvement on TCP benchmarks" * tag 'Smack-for-5.10' of git:// Smack: Remove unnecessary variable initialization Smack: Fix build when NETWORK_SECMARK is not set Smack: Use the netlabel cache Smack: Set socket labels only once Smack: Consolidate uses of secmark into a function
2020-10-05Smack: Remove unnecessary variable initializationCasey Schaufler
The initialization of rc in smack_from_netlbl() is pointless. Reported-by: kernel test robot <> Signed-off-by: Casey Schaufler <>
2020-09-22Smack: Fix build when NETWORK_SECMARK is not setCasey Schaufler
Use proper conditional compilation for the secmark field in the network skb. Reported-by: kernel test robot <> Signed-off-by: Casey Schaufler <>
2020-09-11Smack: Use the netlabel cacheCasey Schaufler
Utilize the Netlabel cache mechanism for incoming packet matching. Refactor the initialization of secattr structures, as it was being done in two places. Signed-off-by: Casey Schaufler <>
2020-09-11Smack: Set socket labels only onceCasey Schaufler
Refactor the IP send checks so that the netlabel value is set only when necessary, not on every send. Some functions get renamed as the changes made the old name misleading. Signed-off-by: Casey Schaufler <>
2020-09-11Smack: Consolidate uses of secmark into a functionCasey Schaufler
Add a function smack_from_skb() that returns the Smack label identified by a network secmark. Replace the explicit uses of the secmark with this function. Signed-off-by: Casey Schaufler <>
2020-08-23treewide: Use fallthrough pseudo-keywordGustavo A. R. Silva
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] Signed-off-by: Gustavo A. R. Silva <>
2020-07-27Smack: prevent underflow in smk_set_cipso()Dan Carpenter
We have an upper bound on "maplevel" but forgot to check for negative values. Fixes: e114e473771c ("Smack: Simplified Mandatory Access Control Kernel") Signed-off-by: Dan Carpenter <> Signed-off-by: Casey Schaufler <>
2020-07-27Smack: fix another vsscanf out of boundsDan Carpenter
This is similar to commit 84e99e58e8d1 ("Smack: slab-out-of-bounds in vsscanf") where we added a bounds check on "rule". Reported-by: Fixes: f7112e6c9abf ("Smack: allow for significantly longer Smack labels v4") Signed-off-by: Dan Carpenter <> Signed-off-by: Casey Schaufler <>
2020-07-14Smack: fix use-after-free in smk_write_relabel_self()Eric Biggers
smk_write_relabel_self() frees memory from the task's credentials with no locking, which can easily cause a use-after-free because multiple tasks can share the same credentials structure. Fix this by using prepare_creds() and commit_creds() to correctly modify the task's credentials. Reproducer for "BUG: KASAN: use-after-free in smk_write_relabel_self": #include <fcntl.h> #include <pthread.h> #include <unistd.h> static void *thrproc(void *arg) { int fd = open("/sys/fs/smackfs/relabel-self", O_WRONLY); for (;;) write(fd, "foo", 3); } int main() { pthread_t t; pthread_create(&t, NULL, thrproc, NULL); thrproc(NULL); } Reported-by: Fixes: 38416e53936e ("Smack: limited capability for changing process label") Cc: <> # v4.4+ Signed-off-by: Eric Biggers <> Signed-off-by: Casey Schaufler <>
2020-06-13Merge tag 'notifications-20200601' of ↵Linus Torvalds
git:// Pull notification queue from David Howells: "This adds a general notification queue concept and adds an event source for keys/keyrings, such as linking and unlinking keys and changing their attributes. Thanks to Debarshi Ray, we do have a pull request to use this to fix a problem with gnome-online-accounts - as mentioned last time: Without this, g-o-a has to constantly poll a keyring-based kerberos cache to find out if kinit has changed anything. [ There are other notification pending: mount/sb fsinfo notifications for libmount that Karel Zak and Ian Kent have been working on, and Christian Brauner would like to use them in lxc, but let's see how this one works first ] LSM hooks are included: - A set of hooks are provided that allow an LSM to rule on whether or not a watch may be set. Each of these hooks takes a different "watched object" parameter, so they're not really shareable. The LSM should use current's credentials. [Wanted by SELinux & Smack] - A hook is provided to allow an LSM to rule on whether or not a particular message may be posted to a particular queue. This is given the credentials from the event generator (which may be the system) and the watch setter. [Wanted by Smack] I've provided SELinux and Smack with implementations of some of these hooks. WHY === Key/keyring notifications are desirable because if you have your kerberos tickets in a file/directory, your Gnome desktop will monitor that using something like fanotify and tell you if your credentials cache changes. However, we also have the ability to cache your kerberos tickets in the session, user or persistent keyring so that it isn't left around on disk across a reboot or logout. Keyrings, however, cannot currently be monitored asynchronously, so the desktop has to poll for it - not so good on a laptop. This facility will allow the desktop to avoid the need to poll. DESIGN DECISIONS ================ - The notification queue is built on top of a standard pipe. Messages are effectively spliced in. The pipe is opened with a special flag: pipe2(fds, O_NOTIFICATION_PIPE); The special flag has the same value as O_EXCL (which doesn't seem like it will ever be applicable in this context)[?]. It is given up front to make it a lot easier to prohibit splice&co from accessing the pipe. [?] Should this be done some other way? I'd rather not use up a new O_* flag if I can avoid it - should I add a pipe3() system call instead? The pipe is then configured:: ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth); ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter); Messages are then read out of the pipe using read(). - It should be possible to allow write() to insert data into the notification pipes too, but this is currently disabled as the kernel has to be able to insert messages into the pipe *without* holding pipe->mutex and the code to make this work needs careful auditing. - sendfile(), splice() and vmsplice() are disabled on notification pipes because of the pipe->mutex issue and also because they sometimes want to revert what they just did - but one or more notification messages might've been interleaved in the ring. - The kernel inserts messages with the wait queue spinlock held. This means that pipe_read() and pipe_write() have to take the spinlock to update the queue pointers. - Records in the buffer are binary, typed and have a length so that they can be of varying size. This allows multiple heterogeneous sources to share a common buffer; there are 16 million types available, of which I've used just a few, so there is scope for others to be used. Tags may be specified when a watchpoint is created to help distinguish the sources. - Records are filterable as types have up to 256 subtypes that can be individually filtered. Other filtration is also available. - Notification pipes don't interfere with each other; each may be bound to a different set of watches. Any particular notification will be copied to all the queues that are currently watching for it - and only those that are watching for it. - When recording a notification, the kernel will not sleep, but will rather mark a queue as having lost a message if there's insufficient space. read() will fabricate a loss notification message at an appropriate point later. - The notification pipe is created and then watchpoints are attached to it, using one of: keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01); watch_mount(AT_FDCWD, "/", 0, fd, 0x02); watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03); where in both cases, fd indicates the queue and the number after is a tag between 0 and 255. - Watches are removed if either the notification pipe is destroyed or the watched object is destroyed. In the latter case, a message will be generated indicating the enforced watch removal. Things I want to avoid: - Introducing features that make the core VFS dependent on the network stack or networking namespaces (ie. usage of netlink). - Dumping all this stuff into dmesg and having a daemon that sits there parsing the output and distributing it as this then puts the responsibility for security into userspace and makes handling namespaces tricky. Further, dmesg might not exist or might be inaccessible inside a container. - Letting users see events they shouldn't be able to see. TESTING AND MANPAGES ==================== - The keyutils tree has a pipe-watch branch that has keyctl commands for making use of notifications. Proposed manual pages can also be found on this branch, though a couple of them really need to go to the main manpages repository instead. If the kernel supports the watching of keys, then running "make test" on that branch will cause the testing infrastructure to spawn a monitoring process on the side that monitors a notifications pipe for all the key/keyring changes induced by the tests and they'll all be checked off to make sure they happened. - A test program is provided (samples/watch_queue/watch_test) that can be used to monitor for keyrings, mount and superblock events. Information on the notifications is simply logged to stdout" * tag 'notifications-20200601' of git:// smack: Implement the watch_key and post_notification hooks selinux: Implement the watch_key security hook keys: Make the KEY_NEED_* perms an enum rather than a mask pipe: Add notification lossage handling pipe: Allow buffers to be marked read-whole-or-error for notifications Add sample notification program watch_queue: Add a key/keyring notification facility security: Add hooks to rule on setting a watch pipe: Add general notification queue support pipe: Add O_NOTIFICATION_PIPE security: Add a hook for the point of notification insertion uapi: General notification queue definitions
2020-06-04Merge branch 'exec-linus' of ↵Linus Torvalds
git:// Pull execve updates from Eric Biederman: "Last cycle for the Nth time I ran into bugs and quality of implementation issues related to exec that could not be easily be fixed because of the way exec is implemented. So I have been digging into exec and cleanup up what I can. I don't think I have exec sorted out enough to fix the issues I started with but I have made some headway this cycle with 4 sets of changes. - promised cleanups after introducing exec_update_mutex - trivial cleanups for exec - control flow simplifications - remove the recomputation of bprm->cred The net result is code that is a bit easier to understand and work with and a decrease in the number of lines of code (if you don't count the added tests)" * 'exec-linus' of git:// (24 commits) exec: Compute file based creds only once exec: Add a per bprm->file version of per_clear binfmt_elf_fdpic: fix execfd build regression selftests/exec: Add binfmt_script regression test exec: Remove recursion from search_binary_handler exec: Generic execfd support exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC exec: Move the call of prepare_binprm into search_binary_handler exec: Allow load_misc_binary to call prepare_binprm unconditionally exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds exec: Teach prepare_exec_creds how exec treats uids & gids exec: Set the point of no return sooner exec: Move handling of the point of no return to the top level exec: Run sync_mm_rss before taking exec_update_mutex exec: Fix spelling of search_binary_handler in a comment exec: Move the comment from above de_thread to above unshare_sighand exec: Rename flush_old_exec begin_new_exec exec: Move most of setup_new_exec into flush_old_exec exec: In setup_new_exec cache current in the local variable me ...
2020-05-20exec: Factor security_bprm_creds_for_exec out of security_bprm_set_credsEric W. Biederman
Today security_bprm_set_creds has several implementations: apparmor_bprm_set_creds, cap_bprm_set_creds, selinux_bprm_set_creds, smack_bprm_set_creds, and tomoyo_bprm_set_creds. Except for cap_bprm_set_creds they all test bprm->called_set_creds and return immediately if it is true. The function cap_bprm_set_creds ignores bprm->calld_sed_creds entirely. Create a new LSM hook security_bprm_creds_for_exec that is called just before prepare_binprm in __do_execve_file, resulting in a LSM hook that is called exactly once for the entire of exec. Modify the bits of security_bprm_set_creds that only want to be called once per exec into security_bprm_creds_for_exec, leaving only cap_bprm_set_creds behind. Remove bprm->called_set_creds all of it's former users have been moved to security_bprm_creds_for_exec. Add or upate comments a appropriate to bring them up to date and to reflect this change. Link: Acked-by: Linus Torvalds <> Acked-by: Casey Schaufler <> # For the LSM and Smack bits Reviewed-by: Kees Cook <> Signed-off-by: "Eric W. Biederman" <>
2020-05-19smack: Implement the watch_key and post_notification hooksDavid Howells
Implement the watch_key security hook in Smack to make sure that a key grants the caller Read permission in order to set a watch on a key. Also implement the post_notification security hook to make sure that the notification source is granted Write permission by the watch queue. For the moment, the watch_devices security hook is left unimplemented as it's not obvious what the object should be since the queue is global and didn't previously exist. Signed-off-by: David Howells <> Acked-by: Casey Schaufler <>
2020-05-19keys: Make the KEY_NEED_* perms an enum rather than a maskDavid Howells
Since the meaning of combining the KEY_NEED_* constants is undefined, make it so that you can't do that by turning them into an enum. The enum is also given some extra values to represent special circumstances, such as: (1) The '0' value is reserved and causes a warning to trap the parameter being unset. (2) The key is to be unlinked and we require no permissions on it, only the keyring, (this replaces the KEY_LOOKUP_FOR_UNLINK flag). (3) An override due to CAP_SYS_ADMIN. (4) An override due to an instantiation token being present. (5) The permissions check is being deferred to later key_permission() calls. The extra values give the opportunity for LSMs to audit these situations. [Note: This really needs overhauling so that lookup_user_key() tells key_task_permission() and the LSM what operation is being done and leaves it to those functions to decide how to map that onto the available permits. However, I don't really want to make these change in the middle of the notifications patchset.] Signed-off-by: David Howells <> cc: Jarkko Sakkinen <> cc: Paul Moore <> cc: Stephen Smalley <> cc: Casey Schaufler <> cc: cc:
2020-05-11Smack: Remove unused inline function smk_ad_setfield_u_fs_path_mntYueHaibing
commit a269434d2fb4 ("LSM: separate LSM_AUDIT_DATA_DENTRY from LSM_AUDIT_DATA_PATH") left behind this, remove it. Signed-off-by: YueHaibing <> Signed-off-by: Casey Schaufler <>
2020-05-06Smack:- Remove redundant inode_smack cacheCasey Schaufler
The inode_smack cache is no longer used. Remove it. Signed-off-by: Vishal Goel <> Signed-off-by: Casey Schaufler <>
2020-05-06Smack:- Remove mutex lock "smk_lock" from inode_smackCasey Schaufler
"smk_lock" mutex is used during inode instantiation in smack_d_instantiate()function. It has been used to avoid simultaneous access on same inode security structure. Since smack related initialization is done only once i.e during inode creation. If the inode has already been instantiated then smack_d_instantiate() function just returns without doing anything. So it means mutex lock is required only during inode creation. But since 2 processes can't create same inodes or files simultaneously. Also linking or some other file operation can't be done simultaneously when the file is getting created since file lookup will fail before dentry inode linkup which is done after smack initialization. So no mutex lock is required in inode_smack structure. It will save memory as well as improve some performance. If 40000 inodes are created in system, it will save 1.5 MB on 32-bit systems & 2.8 MB on 64-bit systems. Signed-off-by: Vishal Goel <> Signed-off-by: Amit Sahrawat <> Signed-off-by: Casey Schaufler <>
2020-05-06Smack: slab-out-of-bounds in vsscanfCasey Schaufler
Add barrier to soob. Return -EOVERFLOW if the buffer is exceeded. Suggested-by: Hillf Danton <> Reported-by: Signed-off-by: Casey Schaufler <>
2020-05-06smack: remove redundant structure variable from header.Maninder Singh
commit afb1cbe37440 ("LSM: Infrastructure management of the inode security") removed usage of smk_rcu, thus removing it from structure. Signed-off-by: Maninder Singh <> Signed-off-by: Vaneet Narang <> Signed-off-by: Casey Schaufler <>
2020-05-06smack: avoid unused 'sip' variable warningArnd Bergmann
The mix of IS_ENABLED() and #ifdef checks has left a combination that causes a warning about an unused variable: security/smack/smack_lsm.c: In function 'smack_socket_connect': security/smack/smack_lsm.c:2838:24: error: unused variable 'sip' [-Werror=unused-variable] 2838 | struct sockaddr_in6 *sip = (struct sockaddr_in6 *)sap; Change the code to use C-style checks consistently so the compiler can handle it correctly. Fixes: 87fbfffcc89b ("broken ping to ipv6 linklocal addresses on debian buster") Signed-off-by: Arnd Bergmann <> Signed-off-by: Casey Schaufler <>
2020-02-08Merge branch 'merge.nfs-fs_parse.1' of ↵Linus Torvalds
git:// Pull vfs file system parameter updates from Al Viro: "Saner fs_parser.c guts and data structures. The system-wide registry of syntax types (string/enum/int32/oct32/.../etc.) is gone and so is the horror switch() in fs_parse() that would have to grow another case every time something got added to that system-wide registry. New syntax types can be added by filesystems easily now, and their namespace is that of functions - not of system-wide enum members. IOW, they can be shared or kept private and if some turn out to be widely useful, we can make them common library helpers, etc., without having to do anything whatsoever to fs_parse() itself. And we already get that kind of requests - the thing that finally pushed me into doing that was "oh, and let's add one for timeouts - things like 15s or 2h". If some filesystem really wants that, let them do it. Without somebody having to play gatekeeper for the variants blessed by direct support in fs_parse(), TYVM. Quite a bit of boilerplate is gone. And IMO the data structures make a lot more sense now. -200LoC, while we are at it" * 'merge.nfs-fs_parse.1' of git:// (25 commits) tmpfs: switch to use of invalfc() cgroup1: switch to use of errorfc() procfs: switch to use of invalfc() hugetlbfs: switch to use of invalfc() cramfs: switch to use of errofc() gfs2: switch to use of errorfc() fuse: switch to use errorfc() ceph: use errorfc() and friends instead of spelling the prefix out prefix-handling analogues of errorf() and friends turn fs_param_is_... into functions fs_parse: handle optional arguments sanely fs_parse: fold fs_parameter_desc/fs_parameter_spec fs_parser: remove fs_parameter_description name field add prefix to fs_context->log ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log new primitive: __fs_parse() switch rbd and libceph to p_log-based primitives struct p_log, variants of warnf() taking that one instead teach logfc() to handle prefices, give it saner calling conventions get rid of cg_invalf() ...
2020-02-07fs_parse: fold fs_parameter_desc/fs_parameter_specAl Viro
The former contains nothing but a pointer to an array of the latter... Signed-off-by: Al Viro <>
2020-02-07fs_parser: remove fs_parameter_description name fieldEric Sandeen
Unused now. Signed-off-by: Eric Sandeen <> Acked-by: David Howells <> Signed-off-by: Al Viro <>
2020-02-05broken ping to ipv6 linklocal addresses on debian busterCasey Schaufler
I am seeing ping failures to IPv6 linklocal addresses with Debian buster. Easiest example to reproduce is: $ ping -c1 -w1 ff02::1%eth1 connect: Invalid argument $ ping -c1 -w1 ff02::1%eth1 PING ff02::01%eth1(ff02::1%eth1) 56 data bytes 64 bytes from fe80::e0:f9ff:fe0c:37%eth1: icmp_seq=1 ttl=64 time=0.059 ms git bisect traced the failure to commit b9ef5513c99b ("smack: Check address length before reading address family") Arguably ping is being stupid since the buster version is not setting the address family properly (ping on stretch for example does): $ strace -e connect ping6 -c1 -w1 ff02::1%eth1 connect(5, {sa_family=AF_UNSPEC, sa_data="\4\1\0\0\0\0\377\2\0\0\0\0\0\0\0\0\0\0\0\0\0\1\3\0\0\0"}, 28) = -1 EINVAL (Invalid argument) but the command works fine on kernels prior to this commit, so this is breakage which goes against the Linux paradigm of "don't break userspace" Cc: Reported-by: David Ahern <> Suggested-by: Tetsuo Handa <> Signed-off-by: Casey Schaufler <>  security/smack/smack_lsm.c | 41 +++++++++++++++++++---------------------- 1 file changed, 19 insertions(+), 22 deletions(-)
2019-10-23pipe: Reduce #inclusion of pipe_fs_i.hDavid Howells
Remove some #inclusions of linux/pipe_fs_i.h that don't seem to be necessary any more. Signed-off-by: David Howells <>
2019-09-23Merge tag 'smack-for-5.4-rc1' of git:// Torvalds
Pull smack updates from Casey Schaufler: "Four patches for v5.4. Nothing is major. All but one are in response to mechanically detected potential issues. The remaining patch cleans up kernel-doc notations" * tag 'smack-for-5.4-rc1' of git:// smack: use GFP_NOFS while holding inode_smack::smk_lock security: smack: Fix possible null-pointer dereferences in smack_socket_sock_rcv_skb() smack: fix some kernel-doc notations Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is set
2019-09-04smack: use GFP_NOFS while holding inode_smack::smk_lockEric Biggers
inode_smack::smk_lock is taken during smack_d_instantiate(), which is called during a filesystem transaction when creating a file on ext4. Therefore to avoid a deadlock, all code that takes this lock must use GFP_NOFS, to prevent memory reclaim from waiting for the filesystem transaction to complete. Reported-by: Cc: Signed-off-by: Eric Biggers <> Signed-off-by: Casey Schaufler <>
2019-09-04security: smack: Fix possible null-pointer dereferences in ↵Jia-Ju Bai
smack_socket_sock_rcv_skb() In smack_socket_sock_rcv_skb(), there is an if statement on line 3920 to check whether skb is NULL: if (skb && skb->secmark != 0) This check indicates skb can be NULL in some cases. But on lines 3931 and 3932, skb is used:>netif = skb->skb_iif; ipv6_skb_to_auditdata(skb, &ad.a, NULL); Thus, possible null-pointer dereferences may occur when skb is NULL. To fix these possible bugs, an if statement is added to check skb. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <> Signed-off-by: Casey Schaufler <>
2019-09-04smack: fix some kernel-doc notationsluanshi
Fix/add kernel-doc notation and fix typos in security/smack/. Signed-off-by: Liguang Zhang <> Signed-off-by: Casey Schaufler <>
2019-09-04Smack: Don't ignore other bprm->unsafe flags if LSM_UNSAFE_PTRACE is setJann Horn
There is a logic bug in the current smack_bprm_set_creds(): If LSM_UNSAFE_PTRACE is set, but the ptrace state is deemed to be acceptable (e.g. because the ptracer detached in the meantime), the other ->unsafe flags aren't checked. As far as I can tell, this means that something like the following could work (but I haven't tested it): - task A: create task B with fork() - task B: set NO_NEW_PRIVS - task B: install a seccomp filter that makes open() return 0 under some conditions - task B: replace fd 0 with a malicious library - task A: attach to task B with PTRACE_ATTACH - task B: execve() a file with an SMACK64EXEC extended attribute - task A: while task B is still in the middle of execve(), exit (which destroys the ptrace relationship) Make sure that if any flags other than LSM_UNSAFE_PTRACE are set in bprm->unsafe, we reject the execve(). Cc: Fixes: 5663884caab1 ("Smack: unify all ptrace accesses in the smack") Signed-off-by: Jann Horn <> Signed-off-by: Casey Schaufler <>
2019-07-19Merge branch 'work.mount0' of ↵Linus Torvalds
git:// Pull vfs mount updates from Al Viro: "The first part of mount updates. Convert filesystems to use the new mount API" * 'work.mount0' of git:// (63 commits) mnt_init(): call shmem_init() unconditionally constify ksys_mount() string arguments don't bother with registering rootfs init_rootfs(): don't bother with init_ramfs_fs() vfs: Convert smackfs to use the new mount API vfs: Convert selinuxfs to use the new mount API vfs: Convert securityfs to use the new mount API vfs: Convert apparmorfs to use the new mount API vfs: Convert openpromfs to use the new mount API vfs: Convert xenfs to use the new mount API vfs: Convert gadgetfs to use the new mount API vfs: Convert oprofilefs to use the new mount API vfs: Convert ibmasmfs to use the new mount API vfs: Convert qib_fs/ipathfs to use the new mount API vfs: Convert efivarfs to use the new mount API vfs: Convert configfs to use the new mount API vfs: Convert binfmt_misc to use the new mount API convenience helper: get_tree_single() convenience helper get_tree_nodev() vfs: Kill sget_userns() ...
2019-07-10Revert "Merge tag 'keys-acl-20190703' of ↵Linus Torvalds
git://" This reverts merge 0f75ef6a9cff49ff612f7ce0578bced9d0b38325 (and thus effectively commits 7a1ade847596 ("keys: Provide KEYCTL_GRANT_PERMISSION") 2e12256b9a76 ("keys: Replace uid/gid/perm permissions checking with an ACL") that the merge brought in). It turns out that it breaks booting with an encrypted volume, and Eric biggers reports that it also breaks the fscrypt tests [1] and loading of in-kernel X.509 certificates [2]. The root cause of all the breakage is likely the same, but David Howells is off email so rather than try to work it out it's getting reverted in order to not impact the rest of the merge window. [1] [2] Link: Reported-by: Eric Biggers <> Cc: David Howells <> Cc: James Morris <> Signed-off-by: Linus Torvalds <>
2019-07-08Merge tag 'keys-acl-20190703' of ↵Linus Torvalds
git:// Pull keyring ACL support from David Howells: "This changes the permissions model used by keys and keyrings to be based on an internal ACL by the following means: - Replace the permissions mask internally with an ACL that contains a list of ACEs, each with a specific subject with a permissions mask. Potted default ACLs are available for new keys and keyrings. ACE subjects can be macroised to indicate the UID and GID specified on the key (which remain). Future commits will be able to add additional subject types, such as specific UIDs or domain tags/namespaces. Also split a number of permissions to give finer control. Examples include splitting the revocation permit from the change-attributes permit, thereby allowing someone to be granted permission to revoke a key without allowing them to change the owner; also the ability to join a keyring is split from the ability to link to it, thereby stopping a process accessing a keyring by joining it and thus acquiring use of possessor permits. - Provide a keyctl to allow the granting or denial of one or more permits to a specific subject. Direct access to the ACL is not granted, and the ACL cannot be viewed" * tag 'keys-acl-20190703' of git:// keys: Provide KEYCTL_GRANT_PERMISSION keys: Replace uid/gid/perm permissions checking with an ACL
2019-07-04vfs: Convert smackfs to use the new mount APIDavid Howells
Convert the smackfs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: David Howells <> cc: Casey Schaufler <> cc: Signed-off-by: Al Viro <>
2019-06-27keys: Replace uid/gid/perm permissions checking with an ACLDavid Howells
Replace the uid/gid/perm permissions checking on a key with an ACL to allow the SETATTR and SEARCH permissions to be split. This will also allow a greater range of subjects to represented. ============ WHY DO THIS? ============ The problem is that SETATTR and SEARCH cover a slew of actions, not all of which should be grouped together. For SETATTR, this includes actions that are about controlling access to a key: (1) Changing a key's ownership. (2) Changing a key's security information. (3) Setting a keyring's restriction. And actions that are about managing a key's lifetime: (4) Setting an expiry time. (5) Revoking a key. and (proposed) managing a key as part of a cache: (6) Invalidating a key. Managing a key's lifetime doesn't really have anything to do with controlling access to that key. Expiry time is awkward since it's more about the lifetime of the content and so, in some ways goes better with WRITE permission. It can, however, be set unconditionally by a process with an appropriate authorisation token for instantiating a key, and can also be set by the key type driver when a key is instantiated, so lumping it with the access-controlling actions is probably okay. As for SEARCH permission, that currently covers: (1) Finding keys in a keyring tree during a search. (2) Permitting keyrings to be joined. (3) Invalidation. But these don't really belong together either, since these actions really need to be controlled separately. Finally, there are number of special cases to do with granting the administrator special rights to invalidate or clear keys that I would like to handle with the ACL rather than key flags and special checks. =============== WHAT IS CHANGED =============== The SETATTR permission is split to create two new permissions: (1) SET_SECURITY - which allows the key's owner, group and ACL to be changed and a restriction to be placed on a keyring. (2) REVOKE - which allows a key to be revoked. The SEARCH permission is split to create: (1) SEARCH - which allows a keyring to be search and a key to be found. (2) JOIN - which allows a keyring to be joined as a session keyring. (3) INVAL - which allows a key to be invalidated. The WRITE permission is also split to create: (1) WRITE - which allows a key's content to be altered and links to be added, removed and replaced in a keyring. (2) CLEAR - which allows a keyring to be cleared completely. This is split out to make it possible to give just this to an administrator. (3) REVOKE - see above. Keys acquire ACLs which consist of a series of ACEs, and all that apply are unioned together. An ACE specifies a subject, such as: (*) Possessor - permitted to anyone who 'possesses' a key (*) Owner - permitted to the key owner (*) Group - permitted to the key group (*) Everyone - permitted to everyone Note that 'Other' has been replaced with 'Everyone' on the assumption that you wouldn't grant a permit to 'Other' that you wouldn't also grant to everyone else. Further subjects may be made available by later patches. The ACE also specifies a permissions mask. The set of permissions is now: VIEW Can view the key metadata READ Can read the key content WRITE Can update/modify the key content SEARCH Can find the key by searching/requesting LINK Can make a link to the key SET_SECURITY Can change owner, ACL, expiry INVAL Can invalidate REVOKE Can revoke JOIN Can join this keyring CLEAR Can clear this keyring The KEYCTL_SETPERM function is then deprecated. The KEYCTL_SET_TIMEOUT function then is permitted if SET_SECURITY is set, or if the caller has a valid instantiation auth token. The KEYCTL_INVALIDATE function then requires INVAL. The KEYCTL_REVOKE function then requires REVOKE. The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an existing keyring. The JOIN permission is enabled by default for session keyrings and manually created keyrings only. ====================== BACKWARD COMPATIBILITY ====================== To maintain backward compatibility, KEYCTL_SETPERM will translate the permissions mask it is given into a new ACL for a key - unless KEYCTL_SET_ACL has been called on that key, in which case an error will be returned. It will convert possessor, owner, group and other permissions into separate ACEs, if each portion of the mask is non-zero. SETATTR permission turns on all of INVAL, REVOKE and SET_SECURITY. WRITE permission turns on WRITE, REVOKE and, if a keyring, CLEAR. JOIN is turned on if a keyring is being altered. The KEYCTL_DESCRIBE function translates the ACL back into a permissions mask to return depending on possessor, owner, group and everyone ACEs. It will make the following mappings: (1) INVAL, JOIN -> SEARCH (2) SET_SECURITY -> SETATTR (3) REVOKE -> WRITE if SETATTR isn't already set (4) CLEAR -> WRITE Note that the value subsequently returned by KEYCTL_DESCRIBE may not match the value set with KEYCTL_SETATTR. ======= TESTING ======= This passes the keyutils testsuite for all but a couple of tests: (1) tests/keyctl/dh_compute/badargs: The first wrong-key-type test now returns EOPNOTSUPP rather than ENOKEY as READ permission isn't removed if the type doesn't have ->read(). You still can't actually read the key. (2) tests/keyctl/permitting/valid: The view-other-permissions test doesn't work as Other has been replaced with Everyone in the ACL. Signed-off-by: David Howells <>