summaryrefslogtreecommitdiff
path: root/security/selinux
AgeCommit message (Collapse)Author
2023-07-28selinux: log about VM being executable by defaultChristian Göttsche
In case virtual memory is being marked as executable by default, SELinux checks regarding explicit potential dangerous use are disabled. Inform the user about it. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-24selinux: convert to ctime accessor functionsJeff Layton
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-89-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-20selinux: fix a 0/NULL mistmatch in ad_net_init_from_iif()Paul Moore
Use a NULL instead of a zero to resolve a int/pointer mismatch. Cc: Paolo Abeni <pabeni@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307210332.4AqFZfzI-lkp@intel.com/ Fixes: dd51fcd42fd6 ("selinux: introduce and use lsm_ad_net_init*() helpers") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-20selinux: introduce SECURITY_SELINUX_DEBUG configurationChristian Göttsche
The policy database code contains several debug output statements related to hashtable utilization. Those are guarded by the macro DEBUG_HASHES, which is neither documented nor set anywhere. Introduce a new Kconfig configuration guarding this and potential other future debugging related code. Disable the setting by default. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: fixed line lengths in the help text] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: introduce and use lsm_ad_net_init*() helpersPaolo Abeni
Perf traces of network-related workload shows a measurable overhead inside the network-related selinux hooks while zeroing the lsm_network_audit struct. In most cases we can delay the initialization of such structure to the usage point, avoiding such overhead in a few cases. Additionally, the audit code accesses the IP address information only for AF_INET* families, and selinux_parse_skb() will fill-out the relevant fields in such cases. When the family field is zeroed or the initialization is followed by the mentioned parsing, the zeroing can be limited to the sk, family and netif fields. By factoring out the audit-data initialization to new helpers, this patch removes some duplicate code and gives small but measurable performance gain under UDP flood. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: update my email addressStephen Smalley
Update my email address; MAINTAINERS was updated some time ago. Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: add missing newlines in pr_err() statementsChristian Göttsche
The kernel print statements do not append an implicit newline to format strings. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-19selinux: drop avtab_search()Christian Göttsche
avtab_search() shares the same logic with avtab_search_node(), except that it returns, if found, a pointer to the struct avtab_node member datum instead of the node itself. Since the member is an embedded struct, and not a pointer, the returned value of avtab_search() and avtab_search_node() will always in unison either be NULL or non-NULL. Drop avtab_search() and replace its calls by avtab_search_node() to deduplicate logic and adopt the only caller caring for the type of the returned value accordingly. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: de-brand SELinuxStephen Smalley
Change "NSA SELinux" to just "SELinux" in Kconfig help text and comments. While NSA was the original primary developer and continues to help maintain SELinux, SELinux has long since transitioned to a wide community of developers and maintainers. SELinux has been part of the mainline Linux kernel for nearly 20 years now [1] and has received contributions from many individuals and organizations. [1] https://lore.kernel.org/lkml/Pine.LNX.4.44.0308082228470.1852-100000@home.osdl.org/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions regarding enforcing statusChristian Göttsche
Use the type bool as parameter type in selinux_status_update_setenforce(). The related function enforcing_enabled() returns the type bool, while the struct selinux_kernel_status member enforcing uses an u32. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: fix implicit conversions in the symtabChristian Göttsche
hashtab_init() takes an u32 as size parameter type. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: use consistent type for AV rule specifierChristian Göttsche
The specifier for avtab keys is always supplied with a type of u16, either as a macro to security_compute_sid() or the member specified of the struct avtab_key. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the LSM hooksChristian Göttsche
Use the identical types in assignments of local variables for the destination. Merge tail calls into return statements. Avoid using leading underscores for function local variable. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the AVC codeChristian Göttsche
Use a consistent type of u32 for sequence numbers. Use a non-negative and input parameter matching type for the hash result. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid implicit conversions in the netif codeChristian Göttsche
Use the identical type sel_netif_hashfn() returns. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: consistently use u32 as sequence number type in the status codeChristian Göttsche
Align the type with the one used in selinux_notify_policy_change() and the sequence member of struct selinux_kernel_status. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: subject line tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: avoid avtab overflowsChristian Göttsche
Prevent inserting more than the supported U32_MAX number of entries. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-18selinux: check for multiplication overflow in put_entry()Christian Göttsche
The function is always inlined and most of the time both relevant arguments are compile time constants, allowing compilers to elide the check. Also the function is part of outputting the policy, which is not performance critical. Also convert the type of the third parameter into a size_t, since it should always be a non-negative number of elements. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-14security: Constify sk in the sk_getsecid hook.Guillaume Nault
The sk_getsecid hook shouldn't need to modify its socket argument. Make it const so that callers of security_sk_classify_flow() can use a const struct sock *. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-10selinux: introduce an initial SID for early boot processesOndrej Mosnacek
Currently, SELinux doesn't allow distinguishing between kernel threads and userspace processes that are started before the policy is first loaded - both get the label corresponding to the kernel SID. The only way a process that persists from early boot can get a meaningful label is by doing a voluntary dyntransition or re-executing itself. Reusing the kernel label for userspace processes is problematic for several reasons: 1. The kernel is considered to be a privileged domain and generally needs to have a wide range of permissions allowed to work correctly, which prevents the policy writer from effectively hardening against early boot processes that might remain running unintentionally after the policy is loaded (they represent a potential extra attack surface that should be mitigated). 2. Despite the kernel being treated as a privileged domain, the policy writer may want to impose certain special limitations on kernel threads that may conflict with the requirements of intentional early boot processes. For example, it is a good hardening practice to limit what executables the kernel can execute as usermode helpers and to confine the resulting usermode helper processes. However, a (legitimate) process surviving from early boot may need to execute a different set of executables. 3. As currently implemented, overlayfs remembers the security context of the process that created an overlayfs mount and uses it to bound subsequent operations on files using this context. If an overlayfs mount is created before the SELinux policy is loaded, these "mounter" checks are made against the kernel context, which may clash with restrictions on the kernel domain (see 2.). To resolve this, introduce a new initial SID (reusing the slot of the former "init" initial SID) that will be assigned to any userspace process started before the policy is first loaded. This is easy to do, as we can simply label any process that goes through the bprm_creds_for_exec LSM hook with the new init-SID instead of propagating the kernel SID from the parent. To provide backwards compatibility for existing policies that are unaware of this new semantic of the "init" initial SID, introduce a new policy capability "userspace_initial_context" and set the "init" SID to the same context as the "kernel" SID unless this capability is set by the policy. Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10selinux: cleanup the policycap accessor functionsPaul Moore
In the process of reverting back to directly accessing the global selinux_state pointer we left behind some artifacts in the selinux_policycap_XXX() helper functions. This patch cleans up some of that left-behind cruft. Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-07-10security: Allow all LSMs to provide xattrs for inode_init_security hookRoberto Sassu
Currently, the LSM infrastructure supports only one LSM providing an xattr and EVM calculating the HMAC on that xattr, plus other inode metadata. Allow all LSMs to provide one or multiple xattrs, by extending the security blob reservation mechanism. Introduce the new lbs_xattr_count field of the lsm_blob_sizes structure, so that each LSM can specify how many xattrs it needs, and the LSM infrastructure knows how many xattr slots it should allocate. Modify the inode_init_security hook definition, by passing the full xattr array allocated in security_inode_init_security(), and the current number of xattr slots in that array filled by LSMs. The first parameter would allow EVM to access and calculate the HMAC on xattrs supplied by other LSMs, the second to not leave gaps in the xattr array, when an LSM requested but did not provide xattrs (e.g. if it is not initialized). Introduce lsm_get_xattr_slot(), which LSMs can call as many times as the number specified in the lbs_xattr_count field of the lsm_blob_sizes structure. During each call, lsm_get_xattr_slot() increments the number of filled xattrs, so that at the next invocation it returns the next xattr slot to fill. Cleanup security_inode_init_security(). Unify the !initxattrs and initxattrs case by simply not allocating the new_xattrs array in the former. Update the documentation to reflect the changes, and fix the description of the xattr name, as it is not allocated anymore. Adapt both SELinux and Smack to use the new definition of the inode_init_security hook, and to call lsm_get_xattr_slot() to obtain and fill the reserved slots in the xattr array. Move the xattr->name assignment after the xattr->value one, so that it is done only in case of successful memory allocation. Finally, change the default return value of the inode_init_security hook from zero to -EOPNOTSUPP, so that BPF LSM correctly follows the hook conventions. Reported-by: Nicolas Bouchinet <nicolas.bouchinet@clip-os.org> Link: https://lore.kernel.org/linux-integrity/Y1FTSIo+1x+4X0LS@archlinux/ Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> [PM: minor comment and variable tweaks, approved by RS] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-06-05selinux: avoid bool as identifier nameChristian Göttsche
Avoid using the identifier `bool` to improve support with future C standards. C23 is about to make `bool` a predefined macro (see N2654). Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-06-02selinux: fix Makefile for versions of make < v4.3Paul Moore
As noted in the comments of this commit, the current SELinux Makefile requires features found in make v4.3 or later, which is problematic as the Linux Kernel currently only requires make v3.82. This patch fixes the SELinux Makefile so that it works properly on these older versions of make, and adds a couple of comments to the Makefile about how it can be improved once make v4.3 is required by the kernel. Fixes: 6f933aa7dfd0 ("selinux: more Makefile tweaks") Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-30selinux: make labeled NFS work when mounted before policy loadOndrej Mosnacek
Currently, when an NFS filesystem that supports passing LSM/SELinux labels is mounted during early boot (before the SELinux policy is loaded), it ends up mounted without the labeling support (i.e. with Fedora policy all files get the generic NFS label system_u:object_r:nfs_t:s0). This is because the information that the NFS mount supports passing labels (communicated to the LSM layer via the kern_flags argument of security_set_mnt_opts()) gets lost and when the policy is loaded the mount is initialized as if the passing is not supported. Fix this by noting the "native labeling" in newsbsec->flags (using a new SE_SBNATIVE flag) on the pre-policy-loaded call of selinux_set_mnt_opts() and then making sure it is respected on the second call from delayed_superblock_init(). Additionally, make inode_doinit_with_dentry() initialize the inode's label from its extended attributes whenever it doesn't find it already intitialized by the filesystem. This is needed to properly initialize pre-existing inodes when delayed_superblock_init() is called. It should not trigger in any other cases (and if it does, it's still better to initialize the correct label instead of leaving the inode unlabeled). Fixes: eb9ae686507b ("SELinux: Add new labeling type native labels") Tested-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> [PM: fixed 'Fixes' tag format] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-30selinux: cleanup exit_sel_fs() declarationXiu Jianfeng
exit_sel_fs() has been removed since commit f22f9aaf6c3d ("selinux: remove the runtime disable functionality"). Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-23selinux: deprecated fs oconChristian Göttsche
The object context type `fs`, not to be confused with the well used object context type `fscon`, was introduced in the initial git commit 1da177e4c3f4 ("Linux-2.6.12-rc2") but never actually used since. The paper "A Security Policy Configuration for the Security-Enhanced Linux" [1] mentions it under `7.2 File System Contexts` but also states: Currently, this configuration is unused. The policy statement defining such object contexts is `fscon`, e.g.: fscon 2 3 gen_context(system_u:object_r:conA_t,s0) \ gen_context(system_u:object_r:conB_t,s0) It is not documented at selinuxproject.org or in the SELinux notebook and not supported by the Reference Policy buildsystem - the statement is not properly sorted - and thus not used in the Reference or Fedora Policy. Print a warning message at policy load for each such object context: SELinux: void and deprecated fs ocon 02:03 This topic was initially highlighted by Nicolas Iooss [2]. [1]: https://media.defense.gov/2021/Jul/29/2002815735/-1/-1/0/SELINUX-SECURITY-POLICY-CONFIGURATION-REPORT.PDF [2]: https://lore.kernel.org/selinux/CAJfZ7=mP2eJaq2BfO3y0VnwUJaY2cS2p=HZMN71z1pKjzaT0Eg@mail.gmail.com/ Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: tweaked deprecation comment, description line wrapping] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-18selinux: make header files self-includingChristian Göttsche
Include all necessary headers in header files to enable third party applications, like LSP servers, to resolve all used symbols. ibpkey.h: include "flask.h" for SECINITSID_UNLABELED initial_sid_to_string.h: include <linux/stddef.h> for NULL Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-18selinux: keep context struct members in syncChristian Göttsche
Commit 53f3517ae087 ("selinux: do not leave dangling pointer behind") reset the `str` field of the `context` struct in an OOM error branch. In this struct the fields `str` and `len` are coupled and should be kept in sync. Set the length to zero according to the string be set to NULL. Fixes: 53f3517ae087 ("selinux: do not leave dangling pointer behind") Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-18selinux: Implement mptcp_add_subflow hookPaolo Abeni
Newly added subflows should inherit the LSM label from the associated MPTCP socket regardless of the current context. This patch implements the above copying sid and class from the MPTCP socket context, deleting the existing subflow label, if any, and then re-creating the correct one. The new helper reuses the selinux_netlbl_sk_security_free() function, and the latter can end-up being called multiple times with the same argument; we additionally need to make it idempotent. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: small cleanups in selinux_audit_rule_init()Paul Moore
A few small tweaks to selinux_audit_rule_init(): - Adjust how we use the @rc variable so we are not doing any extra work in the common/success case. - Related to the above, rework the 'out' jump label so that the success and error paths are different, simplifying both. - Cleanup some of the vertical whitespace while we are making the other changes. Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: declare read-only data arrays constChristian Göttsche
The array of mount tokens in only used in match_opt_prefix() and never modified. The array of symtab names is never modified and only used in the DEBUG_HASHES configuration as output. The array of files for the SElinux filesystem sub-directory `ss` is similar to the other `struct tree_descr` usages only read from to construct the containing entries. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: retain const qualifier on string literal in avtab_hash_eval()Christian Göttsche
The second parameter `tag` of avtab_hash_eval() is only used for printing. In policydb_index() it is called with a string literal: avtab_hash_eval(&p->te_avtab, "rules"); Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: slight formatting tweak in description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: drop return at end of void function avc_insert()Christian Göttsche
Commit 539813e4184a ("selinux: stop returning node from avc_insert()") converted the return value of avc_insert() to void but left the now unnecessary trailing return statement. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: avc: drop unused function avc_disable()Christian Göttsche
Since commit f22f9aaf6c3d ("selinux: remove the runtime disable functionality") the function avc_disable() is no longer used. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: adjust typos in commentsChristian Göttsche
Found by codespell(1) Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: do not leave dangling pointer behindChristian Göttsche
In case mls_context_cpy() fails due to OOM set the free'd pointer in context_cpy() to NULL to avoid it potentially being dereferenced or free'd again in future. Freeing a NULL pointer is well-defined and a hard NULL dereference crash is at least not exploitable and should give a workable stack trace. Fixes: 12b29f34558b ("selinux: support deferred mapping of contexts") Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-05-08selinux: more Makefile tweaksPaul Moore
A few small tweaks to improve the SELinux Makefile: - Define a new variable, 'genhdrs', to represent both flask.h and av_permissions.h; this should help ensure consistent processing for both generated headers. - Move the 'ccflags-y' variable closer to the top, just after the main 'obj-$(CONFIG_SECURITY_SELINUX)' definition to make it more visible and improve the grouping in the Makefile. - Rework some of the vertical whitespace to improve some of the grouping in the Makefile. Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-04-12selinux: ensure av_permissions.h is built when neededPaul Moore
The Makefile rule responsible for building flask.h and av_permissions.h only lists flask.h as a target which means that av_permissions.h is only generated when flask.h needs to be generated. This patch fixes this by adding av_permissions.h as a target to the rule. Fixes: 8753f6bec352 ("selinux: generate flask headers during kernel build") Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-04-12selinux: fix Makefile dependencies of flask.hOndrej Mosnacek
Make the flask.h target depend on the genheaders binary instead of classmap.h to ensure that it is rebuilt if any of the dependencies of genheaders are changed. Notably this fixes flask.h not being rebuilt when initial_sid_to_string.h is modified. Fixes: 8753f6bec352 ("selinux: generate flask headers during kernel build") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-04-04selinux: stop returning node from avc_insert()Stephen Smalley
The callers haven't used the returned node since commit 21193dcd1f3570dd ("SELinux: more careful use of avd in avc_has_perm_noaudit") and the return value assignments were removed in commit 0a9876f36b08706d ("selinux: Remove redundant assignments"). Stop returning the node altogether and make the functions return void. Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> PM: minor subj tweak, repair whitespace damage Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-20selinux: remove the runtime disable functionalityPaul Moore
After working with the larger SELinux-based distros for several years, we're finally at a place where we can disable the SELinux runtime disable functionality. The existing kernel deprecation notice explains the functionality and why we want to remove it: The selinuxfs "disable" node allows SELinux to be disabled at runtime prior to a policy being loaded into the kernel. If disabled via this mechanism, SELinux will remain disabled until the system is rebooted. The preferred method of disabling SELinux is via the "selinux=0" boot parameter, but the selinuxfs "disable" node was created to make it easier for systems with primitive bootloaders that did not allow for easy modification of the kernel command line. Unfortunately, allowing for SELinux to be disabled at runtime makes it difficult to secure the kernel's LSM hooks using the "__ro_after_init" feature. It is that last sentence, mentioning the '__ro_after_init' hardening, which is the real motivation for this change, and if you look at the diffstat you'll see that the impact of this patch reaches across all the different LSMs, helping prevent tampering at the LSM hook level. From a SELinux perspective, it is important to note that if you continue to disable SELinux via "/etc/selinux/config" it may appear that SELinux is disabled, but it is simply in an uninitialized state. If you load a policy with `load_policy -i`, you will see SELinux come alive just as if you had loaded the policy during early-boot. It is also worth noting that the "/sys/fs/selinux/disable" file is always writable now, regardless of the Kconfig settings, but writing to the file has no effect on the system, other than to display an error on the console if a non-zero/true value is written. Finally, in the several years where we have been working on deprecating this functionality, there has only been one instance of someone mentioning any user visible breakage. In this particular case it was an individual's kernel test system, and the workaround documented in the deprecation notice ("selinux=0" on the kernel command line) resolved the issue without problem. Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-20selinux: remove the 'checkreqprot' functionalityPaul Moore
We originally promised that the SELinux 'checkreqprot' functionality would be removed no sooner than June 2021, and now that it is March 2023 it seems like it is a good time to do the final removal. The deprecation notice in the kernel provides plenty of detail on why 'checkreqprot' is not desirable, with the key point repeated below: This was a compatibility mechanism for legacy userspace and for the READ_IMPLIES_EXEC personality flag. However, if set to 1, it weakens security by allowing mappings to be made executable without authorization by policy. The default value of checkreqprot at boot was changed starting in Linux v4.4 to 0 (i.e. check the actual protection), and Android and Linux distributions have been explicitly writing a "0" to /sys/fs/selinux/checkreqprot during initialization for some time. Along with the official deprecation notice, we have been discussing this on-list and directly with several of the larger SELinux-based distros and everyone is happy to see this feature finally removed. In an attempt to catch all of the smaller, and DIY, Linux systems we have been writing a deprecation notice URL into the kernel log, along with a growing ssleep() penalty, when admins enabled checkreqprot at runtime or via the kernel command line. We have yet to have anyone come to us and raise an objection to the deprecation or planned removal. It is worth noting that while this patch removes the checkreqprot functionality, it leaves the user visible interfaces (kernel command line and selinuxfs file) intact, just inert. This should help prevent breakages with existing userspace tools that correctly, but unnecessarily, disable checkreqprot at boot or runtime. Admins that attempt to enable checkreqprot will be met with a removal message in the kernel log. Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-14selinux: stop passing selinux_state pointers and their offspringStephen Smalley
Linus observed that the pervasive passing of selinux_state pointers introduced by me in commit aa8e712cee93 ("selinux: wrap global selinux state") adds overhead and complexity without providing any benefit. The original idea was to pave the way for SELinux namespaces but those have not yet been implemented and there isn't currently a concrete plan to do so. Remove the passing of the selinux_state pointers, reverting to direct use of the single global selinux_state, and likewise remove passing of child pointers like the selinux_avc. The selinux_policy pointer remains as it is needed for atomic switching of policies. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303101057.mZ3Gv5fK-lkp@intel.com/ Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-03-08selinux: uninline unlikely parts of avc_has_perm_noaudit()Paul Moore
This is based on earlier patch posted to the list by Linus, his commit description read: "avc_has_perm_noaudit()is one of those hot functions that end up being used by almost all filesystem operations (through "avc_has_perm()") and it's intended to be cheap enough to inline. However, it turns out that the unlikely parts of it (where it doesn't find an existing avc node) need a fair amount of stack space for the automatic replacement node, so if it were to be inlined (at least clang does not) it would just use stack space unnecessarily. So split the unlikely part out of it, and mark that part noinline. That improves the actual likely part." The basic idea behind the patch was reasonable, but there were minor nits (double indenting, etc.) and the RCU read lock unlock/re-lock in avc_compute_av() began to look even more ugly. This patch builds on Linus' first effort by cleaning things up a bit and removing the RCU unlock/lock dance in avc_compute_av(). Removing the RCU lock dance in avc_compute_av() is safe as there are currently two callers of avc_compute_av(): avc_has_perm_noaudit() and avc_has_extended_perms(). The first caller in avc_has_perm_noaudit() does not require a RCU lock as there is no avc_node to protect so the RCU lock can be dropped before calling avc_compute_av(). The second caller, avc_has_extended_perms(), is similar in that there is no avc_node that requires RCU protection, but the code is simplified by holding the RCU look around the avc_compute_av() call, and given that we enter a RCU critical section in security_compute_av() (called from av_compute_av()) the impact will likely be unnoticeable. It is also worth noting that avc_has_extended_perms() is only called from the SELinux ioctl() access control hook at the moment. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2023-02-23Merge tag 'mm-stable-2023-02-20-13-37' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2023-02-09mm: replace vma->vm_flags direct modifications with modifier callsSuren Baghdasaryan
Replace direct modifications to vma->vm_flags with calls to modifier functions to be able to track flag changes and to keep vma locking correctness. [akpm@linux-foundation.org: fix drivers/misc/open-dice.c, per Hyeonggon Yoo] Link: https://lkml.kernel.org/r/20230126193752.297968-5-surenb@google.com Signed-off-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjun Roy <arjunroy@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David Rientjes <rientjes@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Minchan Kim <minchan@google.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Oskolkov <posk@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Punit Agrawal <punit.agrawal@bytedance.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Song Liu <songliubraving@fb.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-19fs: port inode_owner_or_capable() to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port acl to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port xattr to mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>