Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock fixes from Mickaël Salaün:
"Fix some Landlock audit issues, add related tests, and updates
documentation"
* tag 'landlock-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
landlock: Update log documentation
landlock: Fix documentation for landlock_restrict_self(2)
landlock: Fix documentation for landlock_create_ruleset(2)
selftests/landlock: Add PID tests for audit records
selftests/landlock: Factor out audit fixture in audit_test
landlock: Log the TGID of the domain creator
landlock: Remove incorrect warning
|
|
On IMA policy update, if a measure rule exists in the policy,
IMA_MEASURE is set for ima_policy_flags which makes the violation_check
variable always true. Coupled with a no-action on MAY_READ for a
FILE_CHECK call, we're always taking the inode_lock().
This becomes a performance problem for extremely heavy read-only workloads.
Therefore, prevent this only in the case there's no action to be taken.
Signed-off-by: Frederick Lawler <fred@cloudflare.com>
Acked-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Fix, deduplicate, and improve rendering of landlock_restrict_self(2)'s
flags documentation.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-2-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Move and fix the flags documentation, and improve formatting.
It makes more sense and it eases maintenance to document syscall flags
in landlock.h, where they are defined. This is already the case for
landlock_restrict_self(2)'s flags.
The flags are now rendered like the syscall's parameters and
description.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250416154716.1799902-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
There is a GCC crash bug in the randstruct for latest GCC versions that
is being tickled by landlock[1]. Temporarily disable GCC randstruct for
COMPILE_TEST builds to unbreak CI systems for the coming -rc2. This can
be restored once the bug is fixed.
Suggested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/all/20250407-kbuild-disable-gcc-plugins-v1-1-5d46ae583f5e@kernel.org/ [1]
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250409151154.work.872-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
As for other Audit's "pid" fields, Landlock should use the task's TGID
instead of its TID. Fix this issue by keeping a reference to the TGID
of the domain creator.
Existing tests already check for the PID but only with the thread group
leader, so always the TGID. A following patch adds dedicated tests for
non-leader thread.
Remove the current_real_cred() check which does not make sense because
we only reference a struct pid, whereas a previous version did reference
a struct cred instead.
Cc: Christian Brauner <brauner@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Link: https://lore.kernel.org/r/20250410171725.1265860-1-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
landlock_put_hierarchy() can be called when an error occurs in
landlock_merge_ruleset() due to insufficient memory. In this case, the
domain's audit details might not have been allocated yet, which would
cause landlock_free_hierarchy_details() to print a warning (but still
safely handle this case).
We could keep the WARN_ON_ONCE(!hierarchy) but it's not worth it for
this kind of function, so let's remove it entirely.
Cc: Paul Moore <paul@paul-moore.com>
Reported-by: syzbot+8bca99e91de7e060e4ea@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20250331104709.897062-1-mic@digikod.net
Reviewed-by: Günther Noack <gnoack@google.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Patch series "mseal system mappings", v9.
As discussed during mseal() upstream process [1], mseal() protects the
VMAs of a given virtual memory range against modifications, such as the
read/write (RW) and no-execute (NX) bits. For complete descriptions of
memory sealing, please see mseal.rst [2].
The mseal() is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For example,
such an attacker primitive can break control-flow integrity guarantees
since read-only memory that is supposed to be trusted can become writable
or .text pages can get remapped.
The system mappings are readonly only, memory sealing can protect them
from ever changing to writable or unmmap/remapped as different attributes.
System mappings such as vdso, vvar, vvar_vclock, vectors (arm
compat-mode), sigpage (arm compat-mode), are created by the kernel during
program initialization, and could be sealed after creation.
Unlike the aforementioned mappings, the uprobe mapping is not established
during program startup. However, its lifetime is the same as the
process's lifetime [3]. It could be sealed from creation.
The vsyscall on x86-64 uses a special address (0xffffffffff600000), which
is outside the mm managed range. This means mprotect, munmap, and mremap
won't work on the vsyscall. Since sealing doesn't enhance the vsyscall's
security, it is skipped in this patch. If we ever seal the vsyscall, it
is probably only for decorative purpose, i.e. showing the 'sl' flag in
the /proc/pid/smaps. For this patch, it is ignored.
It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
alter the system mappings during restore operations. UML(User Mode Linux)
and gVisor, rr are also known to change the vdso/vvar mappings.
Consequently, this feature cannot be universally enabled across all
systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default.
To support mseal of system mappings, architectures must define
CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special
mappings calls to pass mseal flag. Additionally, architectures must
confirm they do not unmap/remap system mappings during the process
lifetime. The existence of this flag for an architecture implies that it
does not require the remapping of thest system mappings during process
lifetime, so sealing these mappings is safe from a kernel perspective.
This version covers x86-64 and arm64 archiecture as minimum viable feature.
While no specific CPU hardware features are required for enable this
feature on an archiecture, memory sealing requires a 64-bit kernel. Other
architectures can choose whether or not to adopt this feature. Currently,
I'm not aware of any instances in the kernel code that actively
munmap/mremap a system mapping without a request from userspace. The PPC
does call munmap when _install_special_mapping fails for vdso; however,
it's uncertain if this will ever fail for PPC - this needs to be
investigated by PPC in the future [4]. The UML kernel can add this
support when KUnit tests require it [5].
In this version, we've improved the handling of system mapping sealing
from previous versions, instead of modifying the _install_special_mapping
function itself, which would affect all architectures, we now call
_install_special_mapping with a sealing flag only within the specific
architecture that requires it. This targeted approach offers two key
advantages: 1) It limits the code change's impact to the necessary
architectures, and 2) It aligns with the software architecture by keeping
the core memory management within the mm layer, while delegating the
decision of sealing system mappings to the individual architecture, which
is particularly relevant since 32-bit architectures never require sealing.
Prior to this patch series, we explored sealing special mappings from
userspace using glibc's dynamic linker. This approach revealed several
issues:
- The PT_LOAD header may report an incorrect length for vdso, (smaller
than its actual size). The dynamic linker, which relies on PT_LOAD
information to determine mapping size, would then split and partially
seal the vdso mapping. Since each architecture has its own vdso/vvar
code, fixing this in the kernel would require going through each
archiecture. Our initial goal was to enable sealing readonly mappings,
e.g. .text, across all architectures, sealing vdso from kernel since
creation appears to be simpler than sealing vdso at glibc.
- The [vvar] mapping header only contains address information, not
length information. Similar issues might exist for other special
mappings.
- Mappings like uprobe are not covered by the dynamic linker, and there
is no effective solution for them.
This feature's security enhancements will benefit ChromeOS, Android, and
other high security systems.
Testing:
This feature was tested on ChromeOS and Android for both x86-64 and ARM64.
- Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly,
i.e. "sl" shown in the smaps for those mappings, and mremap is blocked.
- Passing various automation tests (e.g. pre-checkin) on ChromeOS and
Android to ensure the sealing doesn't affect the functionality of
Chromebook and Android phone.
I also tested the feature on Ubuntu on x86-64:
- With config disabled, vdso/vvar is not sealed,
- with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK,
normal operations such as browsing the web, open/edit doc are OK.
Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@chromium.org/ [1]
Link: Documentation/userspace-api/mseal.rst [2]
Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@mail.gmail.com/ [3]
Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@mail.gmail.com/ [4]
Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5]
This patch (of 7):
Provide infrastructure to mseal system mappings. Establish two kernel
configs (CONFIG_MSEAL_SYSTEM_MAPPINGS,
ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS) and VM_SEALED_SYSMAP macro for future
patches.
Link: https://lkml.kernel.org/r/20250305021711.3867874-1-jeffxu@google.com
Link: https://lkml.kernel.org/r/20250305021711.3867874-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Elliot Hughes <enh@google.com>
Cc: Florian Faineli <f.fainelli@gmail.com>
Cc: Greg Ungerer <gerg@kernel.org>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Jason A. Donenfeld <jason@zx2c4.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Linus Waleij <linus.walleij@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Rapoport <mike.rapoport@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Falcato <pedro.falcato@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Röttger <sroettger@google.com>
Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updatesk from Greg KH:
"Here is the big set of driver core updates for 6.15-rc1. Lots of stuff
happened this development cycle, including:
- kernfs scaling changes to make it even faster thanks to rcu
- bin_attribute constify work in many subsystems
- faux bus minor tweaks for the rust bindings
- rust binding updates for driver core, pci, and platform busses,
making more functionaliy available to rust drivers. These are all
due to people actually trying to use the bindings that were in
6.14.
- make Rafael and Danilo full co-maintainers of the driver core
codebase
- other minor fixes and updates"
* tag 'driver-core-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (52 commits)
rust: platform: require Send for Driver trait implementers
rust: pci: require Send for Driver trait implementers
rust: platform: impl Send + Sync for platform::Device
rust: pci: impl Send + Sync for pci::Device
rust: platform: fix unrestricted &mut platform::Device
rust: pci: fix unrestricted &mut pci::Device
rust: device: implement device context marker
rust: pci: use to_result() in enable_device_mem()
MAINTAINERS: driver core: mark Rafael and Danilo as co-maintainers
rust/kernel/faux: mark Registration methods inline
driver core: faux: only create the device if probe() succeeds
rust/faux: Add missing parent argument to Registration::new()
rust/faux: Drop #[repr(transparent)] from faux::Registration
rust: io: fix devres test with new io accessor functions
rust: io: rename `io::Io` accessors
kernfs: Move dput() outside of the RCU section.
efi: rci2: mark bin_attribute as __ro_after_init
rapidio: constify 'struct bin_attribute'
firmware: qemu_fw_cfg: constify 'struct bin_attribute'
powerpc/perf/hv-24x7: Constify 'struct bin_attribute'
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"For this merge window we're splitting BPF pull request into three for
higher visibility: main changes, res_spin_lock, try_alloc_pages.
These are the main BPF changes:
- Add DFA-based live registers analysis to improve verification of
programs with loops (Eduard Zingerman)
- Introduce load_acquire and store_release BPF instructions and add
x86, arm64 JIT support (Peilin Ye)
- Fix loop detection logic in the verifier (Eduard Zingerman)
- Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)
- Add kfunc for populating cpumask bits (Emil Tsalapatis)
- Convert various shell based tests to selftests/bpf/test_progs
format (Bastien Curutchet)
- Allow passing referenced kptrs into struct_ops callbacks (Amery
Hung)
- Add a flag to LSM bpf hook to facilitate bpf program signing
(Blaise Boscaccy)
- Track arena arguments in kfuncs (Ihor Solodrai)
- Add copy_remote_vm_str() helper for reading strings from remote VM
and bpf_copy_from_user_task_str() kfunc (Jordan Rome)
- Add support for timed may_goto instruction (Kumar Kartikeya
Dwivedi)
- Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)
- Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
local storage (Martin KaFai Lau)
- Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)
- Allow retrieving BTF data with BTF token (Mykyta Yatsenko)
- Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
(Song Liu)
- Reject attaching programs to noreturn functions (Yafang Shao)
- Introduce pre-order traversal of cgroup bpf programs (Yonghong
Song)"
* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
bpf: Fix out-of-bounds read in check_atomic_load/store()
libbpf: Add namespace for errstr making it libbpf_errstr
bpf: Add struct_ops context information to struct bpf_prog_aux
selftests/bpf: Sanitize pointer prior fclose()
selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
selftests/bpf: test_xdp_vlan: Rename BPF sections
bpf: clarify a misleading verifier error message
selftests/bpf: Add selftest for attaching fexit to __noreturn functions
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
bpf: Make perf_event_read_output accessible in all program types.
bpftool: Using the right format specifiers
bpftool: Add -Wformat-signedness flag to detect format errors
selftests/bpf: Test freplace from user namespace
libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
bpf: Return prog btf_id without capable check
bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
bpf, x86: Fix objtool warning for timed may_goto
bpf: Check map->record at the beginning of check_and_free_fields()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove legacy compression interface
- Improve scatterwalk API
- Add request chaining to ahash and acomp
- Add virtual address support to ahash and acomp
- Add folio support to acomp
- Remove NULL dst support from acomp
Algorithms:
- Library options are fuly hidden (selected by kernel users only)
- Add Kerberos5 algorithms
- Add VAES-based ctr(aes) on x86
- Ensure LZO respects output buffer length on compression
- Remove obsolete SIMD fallback code path from arm/ghash-ce
Drivers:
- Add support for PCI device 0x1134 in ccp
- Add support for rk3588's standalone TRNG in rockchip
- Add Inside Secure SafeXcel EIP-93 crypto engine support in eip93
- Fix bugs in tegra uncovered by multi-threaded self-test
- Fix corner cases in hisilicon/sec2
Others:
- Add SG_MITER_LOCAL to sg miter
- Convert ubifs, hibernate and xfrm_ipcomp from legacy API to acomp"
* tag 'v6.15-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (187 commits)
crypto: testmgr - Add multibuffer acomp testing
crypto: acomp - Fix synchronous acomp chaining fallback
crypto: testmgr - Add multibuffer hash testing
crypto: hash - Fix synchronous ahash chaining fallback
crypto: arm/ghash-ce - Remove SIMD fallback code path
crypto: essiv - Replace memcpy() + NUL-termination with strscpy()
crypto: api - Call crypto_alg_put in crypto_unregister_alg
crypto: scompress - Fix incorrect stream freeing
crypto: lib/chacha - remove unused arch-specific init support
crypto: remove obsolete 'comp' compression API
crypto: compress_null - drop obsolete 'comp' implementation
crypto: cavium/zip - drop obsolete 'comp' implementation
crypto: zstd - drop obsolete 'comp' implementation
crypto: lzo - drop obsolete 'comp' implementation
crypto: lzo-rle - drop obsolete 'comp' implementation
crypto: lz4hc - drop obsolete 'comp' implementation
crypto: lz4 - drop obsolete 'comp' implementation
crypto: deflate - drop obsolete 'comp' implementation
crypto: 842 - drop obsolete 'comp' implementation
crypto: nx - Migrate to scomp API
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This brings two main changes to Landlock:
- A signal scoping fix with a new interface for user space to know if
it is compatible with the running kernel.
- Audit support to give visibility on why access requests are denied,
including the origin of the security policy, missing access rights,
and description of object(s). This was designed to limit log spam
as much as possible while still alerting about unexpected blocked
access.
With these changes come new and improved documentation, and a lot of
new tests"
* tag 'landlock-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: (36 commits)
landlock: Add audit documentation
selftests/landlock: Add audit tests for network
selftests/landlock: Add audit tests for filesystem
selftests/landlock: Add audit tests for abstract UNIX socket scoping
selftests/landlock: Add audit tests for ptrace
selftests/landlock: Test audit with restrict flags
selftests/landlock: Add tests for audit flags and domain IDs
selftests/landlock: Extend tests for landlock_restrict_self(2)'s flags
selftests/landlock: Add test for invalid ruleset file descriptor
samples/landlock: Enable users to log sandbox denials
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF
landlock: Add LANDLOCK_RESTRICT_SELF_LOG_*_EXEC_* flags
landlock: Log scoped denials
landlock: Log TCP bind and connect denials
landlock: Log truncate and IOCTL denials
landlock: Factor out IOCTL hooks
landlock: Log file-related denials
landlock: Log mount-related denials
landlock: Add AUDIT_LANDLOCK_DOMAIN and log domain status
landlock: Add AUDIT_LANDLOCK_ACCESS and log ptrace denials
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux
Pull capabilities update from Serge Hallyn:
"This contains just one patch that removes a helper function whose last
user (smack) stopped using it in 2018"
* tag 'caps-pr-20250327' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux:
capability: Remove unused has_capability
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull ima updates from Mimi Zohar:
"Two performance improvements, which minimize the number of integrity
violations"
* tag 'integrity-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: limit the number of ToMToU integrity violations
ima: limit the number of open-writers integrity violations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull ipe update from Fan Wu:
"This contains just one commit from Randy Dunlap, which fixes
kernel-doc warnings in the IPE subsystem"
* tag 'ipe-pr-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: policy_fs: fix kernel-doc warnings
|
|
Each time a file in policy, that is already opened for read, is opened
for write, a Time-of-Measure-Time-of-Use (ToMToU) integrity violation
audit message is emitted and a violation record is added to the IMA
measurement list. This occurs even if a ToMToU violation has already
been recorded.
Limit the number of ToMToU integrity violations per file open for read.
Note: The IMA_MAY_EMIT_TOMTOU atomic flag must be set from the reader
side based on policy. This may result in a per file open for read
ToMToU violation.
Since IMA_MUST_MEASURE is only used for violations, rename the atomic
IMA_MUST_MEASURE flag to IMA_MAY_EMIT_TOMTOU.
Cc: stable@vger.kernel.org # applies cleanly up to linux-6.6
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
Each time a file in policy, that is already opened for write, is opened
for read, an open-writers integrity violation audit message is emitted
and a violation record is added to the IMA measurement list. This
occurs even if an open-writers violation has already been recorded.
Limit the number of open-writers integrity violations for an existing
file open for write to one. After the existing file open for write
closes (__fput), subsequent open-writers integrity violations may be
emitted.
Cc: stable@vger.kernel.org # applies cleanly up to linux-6.6
Tested-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Tested-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move vm_table members out of kernel/sysctl.c
All vm_table array members have moved to their respective subsystems
leading to the removal of vm_table from kernel/sysctl.c. This
increases modularity by placing the ctl_tables closer to where they
are actually used and at the same time reducing the chances of merge
conflicts in kernel/sysctl.c.
- ctl_table range fixes
Replace the proc_handler function that checks variable ranges in
coredump_sysctls and vdso_table with the one that actually uses the
extra{1,2} pointers as min/max values. This tightens the range of the
values that users can pass into the kernel effectively preventing
{under,over}flows.
- Misc fixes
Correct grammar errors and typos in test messages. Update sysctl
files in MAINTAINERS. Constified and removed array size in
declaration for alignment_tbl
* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
selftests/sysctl: fix wording of help messages
selftests: fix spelling/grammar errors in sysctl/sysctl.sh
MAINTAINERS: Update sysctl file list in MAINTAINERS
sysctl: Fix underflow value setting risk in vm_table
coredump: Fixes core_pipe_limit sysctl proc_handler
sysctl: remove unneeded include
sysctl: remove the vm_table
sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
fs: dcache: move the sysctl to fs/dcache.c
sunrpc: simplify rpcauth_cache_shrink_count()
fs: drop_caches: move sysctl to fs/drop_caches.c
fs: fs-writeback: move sysctl to fs/fs-writeback.c
mm: nommu: move sysctl to mm/nommu.c
security: min_addr: move sysctl to security/min_addr.c
mm: mmap: move sysctl to mm/mmap.c
mm: util: move sysctls to mm/util.c
mm: vmscan: move vmscan sysctls to mm/vmscan.c
mm: swap: move sysctl to mm/swap.c
mm: filemap: move sysctl to mm/filemap.c
...
|
|
Add LANDLOCK_RESTRICT_SELF_LOG_SUBDOMAINS_OFF for the case of sandboxer
tools, init systems, or runtime containers launching programs sandboxing
themselves in an inconsistent way. Setting this flag should only
depends on runtime configuration (i.e. not hardcoded).
We don't create a new ruleset's option because this should not be part
of the security policy: only the task that enforces the policy (not the
one that create it) knows if itself or its children may request denied
actions.
This is the first and only flag that can be set without actually
restricting the caller (i.e. without providing a ruleset).
Extend struct landlock_cred_security with a u8 log_subdomains_off.
struct landlock_file_security is still 16 bytes.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Closes: https://github.com/landlock-lsm/linux/issues/3
Link: https://lore.kernel.org/r/20250320190717.2287696-19-mic@digikod.net
[mic: Fix comment]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Most of the time we want to log denied access because they should not
happen and such information helps diagnose issues. However, when
sandboxing processes that we know will try to access denied resources
(e.g. unknown, bogus, or malicious binary), we might want to not log
related access requests that might fill up logs.
By default, denied requests are logged until the task call execve(2).
If the LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF flag is set, denied
requests will not be logged for the same executed file.
If the LANDLOCK_RESTRICT_SELF_LOG_NEW_EXEC_ON flag is set, denied
requests from after an execve(2) call will be logged.
The rationale is that a program should know its own behavior, but not
necessarily the behavior of other programs.
Because LANDLOCK_RESTRICT_SELF_LOG_SAME_EXEC_OFF is set for a specific
Landlock domain, it makes it possible to selectively mask some access
requests that would be logged by a parent domain, which might be handy
for unprivileged processes to limit logs. However, system
administrators should still use the audit filtering mechanism. There is
intentionally no audit nor sysctl configuration to re-enable these logs.
This is delegated to the user space program.
Increment the Landlock ABI version to reflect this interface change.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-18-mic@digikod.net
[mic: Rename variables and fix __maybe_unused]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support for unix_stream_connect, unix_may_send, task_kill, and
file_send_sigiotask hooks.
The related blockers are:
- scope.abstract_unix_socket
- scope.signal
Audit event sample for abstract unix socket:
type=LANDLOCK_DENY msg=audit(1729738800.268:30): domain=195ba459b blockers=scope.abstract_unix_socket path=00666F6F
Audit event sample for signal:
type=LANDLOCK_DENY msg=audit(1729738800.291:31): domain=195ba459b blockers=scope.signal opid=1 ocomm="systemd"
Refactor and simplify error handling in LSM hooks.
Extend struct landlock_file_security with fown_layer and use it to log
the blocking domain. The struct aligned size is still 16 bytes.
Cc: Günther Noack <gnoack@google.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-17-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support to socket_bind and socket_connect hooks.
The related blockers are:
- net.bind_tcp
- net.connect_tcp
Audit event sample:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=net.connect_tcp daddr=127.0.0.1 dest=80
Cc: Günther Noack <gnoack@google.com>
Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-16-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support to the file_truncate and file_ioctl hooks.
Add a deny_masks_t type and related helpers to store the domain's layer
level per optional access rights (i.e. LANDLOCK_ACCESS_FS_TRUNCATE and
LANDLOCK_ACCESS_FS_IOCTL_DEV) when opening a file, which cannot be
inferred later. In practice, the landlock_file_security aligned blob size is
still 16 bytes because this new one-byte deny_masks field follows the
existing two-bytes allowed_access field and precede the packed
fown_subject.
Implementing deny_masks_t with a bitfield instead of a struct enables a
generic implementation to store and extract layer levels.
Add KUnit tests to check the identification of a layer level from a
deny_masks_t, and the computation of a deny_masks_t from an access right
with its layer level or a layer_mask_t array.
Audit event sample:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=fs.ioctl_dev path="/dev/tty" dev="devtmpfs" ino=9 ioctlcmd=0x5401
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-15-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Compat and non-compat IOCTL hooks are almost the same, except to compare
the IOCTL command. Factor out these two IOCTL hooks to highlight the
difference and minimize audit changes (see next commit).
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-14-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support for path_mkdir, path_mknod, path_symlink, path_unlink,
path_rmdir, path_truncate, path_link, path_rename, and file_open hooks.
The dedicated blockers are:
- fs.execute
- fs.write_file
- fs.read_file
- fs.read_dir
- fs.remove_dir
- fs.remove_file
- fs.make_char
- fs.make_dir
- fs.make_reg
- fs.make_sock
- fs.make_fifo
- fs.make_block
- fs.make_sym
- fs.refer
- fs.truncate
- fs.ioctl_dev
Audit event sample for a denied link action:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=fs.refer path="/usr/bin" dev="vda2" ino=351
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=fs.make_reg,fs.refer path="/usr/local" dev="vda2" ino=365
We could pack blocker names (e.g. "fs:make_reg,refer") but that would
increase complexity for the kernel and log parsers. Moreover, this
could not handle blockers of different classes (e.g. fs and net). Make
it simple and flexible instead.
Add KUnit tests to check the identification from a layer_mask_t array of
the first layer level denying such request.
Cc: Günther Noack <gnoack@google.com>
Depends-on: 058518c20920 ("landlock: Align partial refer access checks with final ones")
Depends-on: d617f0d72d80 ("landlock: Optimize file path walks and prepare for audit support")
Link: https://lore.kernel.org/r/20250320190717.2287696-13-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add audit support for sb_mount, move_mount, sb_umount, sb_remount, and
sb_pivot_root hooks.
The new related blocker is "fs.change_topology".
Audit event sample:
type=LANDLOCK_DENY msg=audit(1729738800.349:44): domain=195ba459b blockers=fs.change_topology name="/" dev="tmpfs" ino=1
Remove landlock_get_applicable_domain() and get_current_fs_domain()
which are now fully replaced with landlock_get_applicable_subject().
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-12-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Asynchronously log domain information when it first denies an access.
This minimize the amount of generated logs, which makes it possible to
always log denials for the current execution since they should not
happen. These records are identified with the new AUDIT_LANDLOCK_DOMAIN
type.
The AUDIT_LANDLOCK_DOMAIN message contains:
- the "domain" ID which is described;
- the "status" which can either be "allocated" or "deallocated";
- the "mode" which is for now only "enforcing";
- for the "allocated" status, a minimal set of properties to easily
identify the task that loaded the domain's policy with
landlock_restrict_self(2): "pid", "uid", executable path ("exe"), and
command line ("comm");
- for the "deallocated" state, the number of "denials" accounted to this
domain, which is at least 1.
This requires each domain to save these task properties at creation
time in the new struct landlock_details. A reference to the PID is kept
for the lifetime of the domain to avoid race conditions when
investigating the related task. The executable path is resolved and
stored to not keep a reference to the filesystem and block related
actions. All these metadata are stored for the lifetime of the related
domain and should then be minimal. The required memory is not accounted
to the task calling landlock_restrict_self(2) contrary to most other
Landlock allocations (see related comment).
The AUDIT_LANDLOCK_DOMAIN record follows the first AUDIT_LANDLOCK_ACCESS
record for the same domain, which is always followed by AUDIT_SYSCALL
and AUDIT_PROCTITLE. This is in line with the audit logic to first
record the cause of an event, and then add context with other types of
record.
Audit event sample for a first denial:
type=LANDLOCK_ACCESS msg=audit(1732186800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=LANDLOCK_DOMAIN msg=audit(1732186800.349:44): domain=195ba459b status=allocated mode=enforcing pid=300 uid=0 exe="/root/sandboxer" comm="sandboxer"
type=SYSCALL msg=audit(1732186800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
Audit event sample for a following denial:
type=LANDLOCK_ACCESS msg=audit(1732186800.372:45): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=SYSCALL msg=audit(1732186800.372:45): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
Log domain deletion with the "deallocated" state when a domain was
previously logged. This makes it possible for log parsers to free
potential resources when a domain ID will never show again.
The number of denied access requests is useful to easily check how many
access requests a domain blocked and potentially if some of them are
missing in logs because of audit rate limiting, audit rules, or Landlock
log configuration flags (see following commit).
Audit event sample for a deletion of a domain that denied something:
type=LANDLOCK_DOMAIN msg=audit(1732186800.393:46): domain=195ba459b status=deallocated denials=2
Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-11-mic@digikod.net
[mic: Update comment and GFP flag for landlock_log_drop_domain()]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add a new AUDIT_LANDLOCK_ACCESS record type dedicated to an access
request denied by a Landlock domain. AUDIT_LANDLOCK_ACCESS indicates
that something unexpected happened.
For now, only denied access are logged, which means that any
AUDIT_LANDLOCK_ACCESS record is always followed by a SYSCALL record with
"success=no". However, log parsers should check this syscall property
because this is the only sign that a request was denied. Indeed, we
could have "success=yes" if Landlock would support a "permissive" mode.
We could also add a new field to AUDIT_LANDLOCK_DOMAIN for this mode
(see following commit).
By default, the only logged access requests are those coming from the
same executed program that enforced the Landlock restriction on itself.
In other words, no audit record are created for a task after it called
execve(2). This is required to avoid log spam because programs may only
be aware of their own restrictions, but not the inherited ones.
Following commits will allow to conditionally generate
AUDIT_LANDLOCK_ACCESS records according to dedicated
landlock_restrict_self(2)'s flags.
The AUDIT_LANDLOCK_ACCESS message contains:
- the "domain" ID restricting the action on an object,
- the "blockers" that are missing to allow the requested access,
- a set of fields identifying the related object (e.g. task identified
with "opid" and "ocomm").
The blockers are implicit restrictions (e.g. ptrace), or explicit access
rights (e.g. filesystem), or explicit scopes (e.g. signal). This field
contains a list of at least one element, each separated with a comma.
The initial blocker is "ptrace", which describe all implicit Landlock
restrictions related to ptrace (e.g. deny tracing of tasks outside a
sandbox).
Add audit support to ptrace_access_check and ptrace_traceme hooks. For
the ptrace_access_check case, we log the current/parent domain and the
child task. For the ptrace_traceme case, we log the parent domain and
the current/child task. Indeed, the requester and the target are the
current task, but the action would be performed by the parent task.
Audit event sample:
type=LANDLOCK_ACCESS msg=audit(1729738800.349:44): domain=195ba459b blockers=ptrace opid=1 ocomm="systemd"
type=SYSCALL msg=audit(1729738800.349:44): arch=c000003e syscall=101 success=no [...] pid=300 auid=0
A following commit adds user documentation.
Add KUnit tests to check reading of domain ID relative to layer level.
The quick return for non-landlocked tasks is moved from task_ptrace() to
each LSM hooks.
It is not useful to inline the audit_enabled check because other
computation are performed by landlock_log_denial().
Use scoped guards for RCU read-side critical sections.
Cc: Günther Noack <gnoack@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-10-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Extend struct landlock_cred_security with a domain_exec bitmask to
identify which Landlock domain were created by the current task's bprm.
The whole bitmask is reset on each execve(2) call.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-9-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This cosmetic change is needed for audit support, specifically to be
able to filter according to cross-execution boundaries.
struct landlock_file_security's size stay the same for now but it will
increase with struct landlock_cred_security's size.
Only save Landlock domain in hook_file_set_fowner() if the current
domain has LANDLOCK_SCOPE_SIGNAL, which was previously done for each
hook_file_send_sigiotask() calls. This should improve a bit
performance.
Replace hardcoded LANDLOCK_SCOPE_SIGNAL with the signal_scope.scope
variable.
Use scoped guards for RCU read-side critical sections.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-8-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This cosmetic change that is needed for audit support, specifically to
be able to filter according to cross-execution boundaries.
Replace hardcoded LANDLOCK_SCOPE_SIGNAL with the signal_scope.scope
variable.
Use scoped guards for RCU read-side critical sections.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-7-mic@digikod.net
[mic: Update headers]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This cosmetic change that is needed for audit support, specifically to
be able to filter according to cross-execution boundaries.
Optimize current_check_access_socket() to only handle the access
request.
Remove explicit domain->num_layers check which is now part of the
landlock_get_applicable_subject() call.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-6-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
This cosmetic change is needed for audit support, specifically to be
able to filter according to cross-execution boundaries.
Add landlock_get_applicable_subject(), mainly a copy of
landlock_get_applicable_domain(), which will fully replace it in a
following commit.
Optimize current_check_access_path() to only handle the access request.
Partially replace get_current_fs_domain() with explicit calls to
landlock_get_applicable_subject(). The remaining ones will follow with
more changes.
Remove explicit domain->num_layers check which is now part of the
landlock_get_applicable_subject() call.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Create a new domain.h file containing the struct landlock_hierarchy
definition and helpers. This type will grow with audit support. This
also prepares for a new domain type.
Cc: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-4-mic@digikod.net
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Landlock IDs can be generated to uniquely identify Landlock objects.
For now, only Landlock domains get an ID at creation time. These IDs
map to immutable domain hierarchies.
Landlock IDs have important properties:
- They are unique during the lifetime of the running system thanks to
the 64-bit values: at worse, 2^60 - 2*2^32 useful IDs.
- They are always greater than 2^32 and must then be stored in 64-bit
integer types.
- The initial ID (at boot time) is randomly picked between 2^32 and
2^33, which limits collisions in logs across different boots.
- IDs are sequential, which enables users to order them.
- IDs may not be consecutive but increase with a random 2^4 step, which
limits side channels.
Such IDs can be exposed to unprivileged processes, even if it is not the
case with this audit patch series. The domain IDs will be useful for
user space to identify sandboxes and get their properties.
These Landlock IDs are more secure that other absolute kernel IDs such
as pipe's inodes which rely on a shared global counter.
For checkpoint/restore features (i.e. CRIU), we could easily implement a
privileged interface (e.g. sysfs) to set the next ID counter.
IDR/IDA are not used because we only need a bijection from Landlock
objects to Landlock IDs, and we must not recycle IDs. This enables us
to identify all Landlock objects during the lifetime of the system (e.g.
in logs), but not to access an object from an ID nor know if an ID is
assigned. Using a counter is simpler, it scales (i.e. avoids growing
memory footprint), and it does not require locking. We'll use proper
file descriptors (with IDs used as inode numbers) to access Landlock
objects.
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Extract code from dump_common_audit_data() into the audit_log_lsm_data()
helper. This helps reuse common LSM audit data while not abusing
AUDIT_AVC records because of the common_lsm_audit() helper.
Depends-on: 7ccbe076d987 ("lsm: Only build lsm_audit.c if CONFIG_SECURITY and CONFIG_AUDIT are set")
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250320190717.2287696-2-mic@digikod.net
Reviewed-by: Günther Noack <gnoack3000@gmail.com>
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Because Linux credentials are managed per thread, user space relies on
some hack to synchronize credential update across threads from the same
process. This is required by the Native POSIX Threads Library and
implemented by set*id(2) wrappers and libcap(3) to use tgkill(2) to
synchronize threads. See nptl(7) and libpsx(3). Furthermore, some
runtimes like Go do not enable developers to have control over threads
[1].
To avoid potential issues, and because threads are not security
boundaries, let's relax the Landlock (optional) signal scoping to always
allow signals sent between threads of the same process. This exception
is similar to the __ptrace_may_access() one.
hook_file_set_fowner() now checks if the target task is part of the same
process as the caller. If this is the case, then the related signal
triggered by the socket will always be allowed.
Scoping of abstract UNIX sockets is not changed because kernel objects
(e.g. sockets) should be tied to their creator's domain at creation
time.
Note that creating one Landlock domain per thread puts each of these
threads (and their future children) in their own scope, which is
probably not what users expect, especially in Go where we do not control
threads. However, being able to drop permissions on all threads should
not be restricted by signal scoping. We are working on a way to make it
possible to atomically restrict all threads of a process with the same
domain [2].
Add erratum for signal scoping.
Closes: https://github.com/landlock-lsm/go-landlock/issues/36
Fixes: 54a6e6bbf3be ("landlock: Add signal scoping")
Fixes: c8994965013e ("selftests/landlock: Test signal scoping for threads")
Depends-on: 26f204380a3c ("fs: Fix file_set_fowner LSM hook inconsistencies")
Link: https://pkg.go.dev/kernel.org/pub/linux/libs/security/libcap/psx [1]
Link: https://github.com/landlock-lsm/linux/issues/2 [2]
Cc: Günther Noack <gnoack@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Tahera Fahimi <fahimitahera@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20250318161443.279194-6-mic@digikod.net
[mic: Add extra pointer check and RCU guard, and ease backport]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Pull smack updates from Casey Schaufler:
"This is a larger set of patches than usual, consisting of a set of
build clean-ups, a rework of error handling in setting up CIPSO label
specification and a bug fix in network labeling"
* tag 'Smack-for-6.15' of https://github.com/cschaufler/smack-next:
smack: recognize ipv4 CIPSO w/o categories
smack: Revert "smackfs: Added check catlen"
smack: remove /smack/logging if audit is not configured
smack: ipv4/ipv6: tcp/dccp/sctp: fix incorrect child socket label
smack: dont compile ipv6 code unless ipv6 is configured
Smack: fix typos and spelling errors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Add additional SELinux access controls for kernel file reads/loads
The SELinux kernel file read/load access controls were never updated
beyond the initial kernel module support, this pull request adds
support for firmware, kexec, policies, and x.509 certificates.
- Add support for wildcards in network interface names
There are a number of userspace tools which auto-generate network
interface names using some pattern of <XXXX>-<NN> where <XXXX> is a
fixed string, e.g. "podman", and <NN> is a increasing counter.
Supporting wildcards in the SELinux policy for network interfaces
simplifies the policy associted with these interfaces.
- Fix a potential problem in the kernel read file SELinux code
SELinux should always check the file label in the
security_kernel_read_file() LSM hook, regardless of if the file is
being read in chunks. Unfortunately, the existing code only
considered the file label on the first chunk; this pull request fixes
this problem.
There is more detail in the individual commit, but thankfully the
existing code didn't expose a bug due to multi-stage reads only
taking place in one driver, and that driver loading a file type that
isn't targeted by the SELinux policy.
- Fix the subshell error handling in the example policy loader
Minor fix to SELinux example policy loader in scripts/selinux due to
an undesired interaction with subshells and errexit.
* tag 'selinux-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: get netif_wildcard policycap from policy instead of cache
selinux: support wildcard network interface names
selinux: Chain up tool resolving errors in install_policy.sh
selinux: add permission checks for loading other kinds of kernel files
selinux: always check the file label in selinux_kernel_read_file()
selinux: fix spelling error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Various minor updates to the LSM Rust bindings
Changes include marking trivial Rust bindings as inlines and comment
tweaks to better reflect the LSM hooks.
- Add LSM/SELinux access controls to io_uring_allowed()
Similar to the io_uring_disabled sysctl, add a LSM hook to
io_uring_allowed() to enable LSMs a simple way to enforce security
policy on the use of io_uring. This pull request includes SELinux
support for this new control using the io_uring/allowed permission.
- Remove an unused parameter from the security_perf_event_open() hook
The perf_event_attr struct parameter was not used by any currently
supported LSMs, remove it from the hook.
- Add an explicit MAINTAINERS entry for the credentials code
We've seen problems in the past where patches to the credentials code
sent by non-maintainers would often languish on the lists for
multiple months as there was no one explicitly tasked with the
responsibility of reviewing and/or merging credentials related code.
Considering that most of the code under security/ has a vested
interest in ensuring that the credentials code is well maintained,
I'm volunteering to look after the credentials code and Serge Hallyn
has also volunteered to step up as an official reviewer. I posted the
MAINTAINERS update as a RFC to LKML in hopes that someone else would
jump up with an "I'll do it!", but beyond Serge it was all crickets.
- Update Stephen Smalley's old email address to prevent confusion
This includes a corresponding update to the mailmap file.
* tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
mailmap: map Stephen Smalley's old email addresses
lsm: remove old email address for Stephen Smalley
MAINTAINERS: add Serge Hallyn as a credentials reviewer
MAINTAINERS: add an explicit credentials entry
cred,rust: mark Credential methods inline
lsm,rust: reword "destroy" -> "release" in SecurityCtx
lsm,rust: mark SecurityCtx methods inline
perf: Remove unnecessary parameter of security check
lsm: fix a missing security_uring_allowed() prototype
io_uring,lsm,selinux: add LSM hooks for io_uring_setup()
io_uring: refactor io_uring_allowed()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"As usual, it's scattered changes all over. Patches touching things
outside of our traditional areas in the tree have been Acked by
maintainers or were trivial changes:
- loadpin: remove unsupported MODULE_COMPRESS_NONE (Arulpandiyan
Vadivel)
- samples/check-exec: Fix script name (Mickaël Salaün)
- yama: remove needless locking in yama_task_prctl() (Oleg Nesterov)
- lib/string_choices: Sort by function name (R Sundar)
- hardening: Allow default HARDENED_USERCOPY to be set at compile
time (Mel Gorman)
- uaccess: Split out compile-time checks into ucopysize.h
- kbuild: clang: Support building UM with SUBARCH=i386
- x86: Enable i386 FORTIFY_SOURCE on Clang 16+
- ubsan/overflow: Rework integer overflow sanitizer option
- Add missing __nonstring annotations for callers of
memtostr*()/strtomem*()
- Add __must_be_noncstr() and have memtostr*()/strtomem*() check for
it
- Introduce __nonstring_array for silencing future GCC 15 warnings"
* tag 'hardening-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
compiler_types: Introduce __nonstring_array
hardening: Enable i386 FORTIFY_SOURCE on Clang 16+
x86/build: Remove -ffreestanding on i386 with GCC
ubsan/overflow: Enable ignorelist parsing and add type filter
ubsan/overflow: Enable pattern exclusions
ubsan/overflow: Rework integer overflow sanitizer option to turn on everything
samples/check-exec: Fix script name
yama: don't abuse rcu_read_lock/get_task_struct in yama_task_prctl()
kbuild: clang: Support building UM with SUBARCH=i386
loadpin: remove MODULE_COMPRESS_NONE as it is no longer supported
lib/string_choices: Rearrange functions in sorted order
string.h: Validate memtostr*()/strtomem*() arguments more carefully
compiler.h: Introduce __must_be_noncstr()
nilfs2: Mark on-disk strings as nonstring
uapi: stddef.h: Introduce __kernel_nonstring
x86/tdx: Mark message.bytes as nonstring
string: kunit: Mark nonstring test strings as __nonstring
scsi: qla2xxx: Mark device strings as nonstring
scsi: mpt3sas: Mark device strings as nonstring
scsi: mpi3mr: Mark device strings as nonstring
...
|
|
Use the "struct" keyword in kernel-doc when describing struct
ipefs_file. Add kernel-doc for the struct members also.
Don't use kernel-doc notation for 'policy_subdir'. kernel-doc does
not support documentation comments for data definitions.
This eliminates multiple kernel-doc warnings:
security/ipe/policy_fs.c:21: warning: cannot understand function prototype: 'struct ipefs_file '
security/ipe/policy_fs.c:407: warning: cannot understand function prototype: 'const struct ipefs_file policy_subdir[] = '
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Fan Wu <wufan@kernel.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: James Morris <jmorris@namei.org>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: linux-security-module@vger.kernel.org
Signed-off-by: Fan Wu <wufan@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async dir updates from Christian Brauner:
"This contains cleanups that fell out of the work from async directory
handling:
- Change kern_path_locked() and user_path_locked_at() to never return
a negative dentry. This simplifies the usability of these helpers
in various places
- Drop d_exact_alias() from the remaining place in NFS where it is
still used. This also allows us to drop the d_exact_alias() helper
completely
- Drop an unnecessary call to fh_update() from nfsd_create_locked()
- Change i_op->mkdir() to return a struct dentry
Change vfs_mkdir() to return a dentry provided by the filesystems
which is hashed and positive. This allows us to reduce the number
of cases where the resulting dentry is not positive to very few
cases. The code in these places becomes simpler and easier to
understand.
- Repack DENTRY_* and LOOKUP_* flags"
* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: fix inline emphasis warning
VFS: Change vfs_mkdir() to return the dentry.
nfs: change mkdir inode_operation to return alternate dentry if needed.
fuse: return correct dentry for ->mkdir
ceph: return the correct dentry on mkdir
hostfs: store inode in dentry after mkdir if possible.
Change inode_operations.mkdir to return struct dentry *
nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
nfs/vfs: discard d_exact_alias()
VFS: add common error checks to lookup_one_qstr_excl()
VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
VFS: repack LOOKUP_ bit flags.
VFS: repack DENTRY_ flags.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Mount notifications
The day has come where we finally provide a new api to listen for
mount topology changes outside of /proc/<pid>/mountinfo. A mount
namespace file descriptor can be supplied and registered with
fanotify to listen for mount topology changes.
Currently notifications for mount, umount and moving mounts are
generated. The generated notification record contains the unique
mount id of the mount.
The listmount() and statmount() api can be used to query detailed
information about the mount using the received unique mount id.
This allows userspace to figure out exactly how the mount topology
changed without having to generating diffs of /proc/<pid>/mountinfo
in userspace.
- Support O_PATH file descriptors with FSCONFIG_SET_FD in the new mount
api
- Support detached mounts in overlayfs
Since last cycle we support specifying overlayfs layers via file
descriptors. However, we don't allow detached mounts which means
userspace cannot user file descriptors received via
open_tree(OPEN_TREE_CLONE) and fsmount() directly. They have to
attach them to a mount namespace via move_mount() first.
This is cumbersome and means they have to undo mounts via umount().
Allow them to directly use detached mounts.
- Allow to retrieve idmappings with statmount
Currently it isn't possible to figure out what idmapping has been
attached to an idmapped mount. Add an extension to statmount() which
allows to read the idmapping from the mount.
- Allow creating idmapped mounts from mounts that are already idmapped
So far it isn't possible to allow the creation of idmapped mounts
from already idmapped mounts as this has significant lifetime
implications. Make the creation of idmapped mounts atomic by allow to
pass struct mount_attr together with the open_tree_attr() system call
allowing to solve these issues without complicating VFS lookup in any
way.
The system call has in general the benefit that creating a detached
mount and applying mount attributes to it becomes an atomic operation
for userspace.
- Add a way to query statmount() for supported options
Allow userspace to query which mount information can be retrieved
through statmount().
- Allow superblock owners to force unmount
* tag 'vfs-6.15-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (21 commits)
umount: Allow superblock owners to force umount
selftests: add tests for mount notification
selinux: add FILE__WATCH_MOUNTNS
samples/vfs: fix printf format string for size_t
fs: allow changing idmappings
fs: add kflags member to struct mount_kattr
fs: add open_tree_attr()
fs: add copy_mount_setattr() helper
fs: add vfs_open_tree() helper
statmount: add a new supported_mask field
samples/vfs: add STATMOUNT_MNT_{G,U}IDMAP
selftests: add tests for using detached mount with overlayfs
samples/vfs: check whether flag was raised
statmount: allow to retrieve idmappings
uidgid: add map_id_range_up()
fs: allow detached mounts in clone_private_mount()
selftests/overlayfs: test specifying layers as O_PATH file descriptors
fs: support O_PATH fds with FSCONFIG_SET_FD
vfs: add notifications for mount attach and detach
fanotify: notify on mount attach and detach
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Features:
- Add CONFIG_DEBUG_VFS infrastucture:
- Catch invalid modes in open
- Use the new debug macros in inode_set_cached_link()
- Use debug-only asserts around fd allocation and install
- Place f_ref to 3rd cache line in struct file to resolve false
sharing
Cleanups:
- Start using anon_inode_getfile_fmode() helper in various places
- Don't take f_lock during SEEK_CUR if exclusion is guaranteed by
f_pos_lock
- Add unlikely() to kcmp()
- Remove legacy ->remount_fs method from ecryptfs after port to the
new mount api
- Remove invalidate_inodes() in favour of evict_inodes()
- Simplify ep_busy_loopER by removing unused argument
- Avoid mmap sem relocks when coredumping with many missing pages
- Inline getname()
- Inline new_inode_pseudo() and de-staticize alloc_inode()
- Dodge an atomic in putname if ref == 1
- Consistently deref the files table with rcu_dereference_raw()
- Dedup handling of struct filename init and refcounts bumps
- Use wq_has_sleeper() in end_dir_add()
- Drop the lock trip around I_NEW wake up in evict()
- Load the ->i_sb pointer once in inode_sb_list_{add,del}
- Predict not reaching the limit in alloc_empty_file()
- Tidy up do_sys_openat2() with likely/unlikely
- Call inode_sb_list_add() outside of inode hash lock
- Sort out fd allocation vs dup2 race commentary
- Turn page_offset() into a wrapper around folio_pos()
- Remove locking in exportfs around ->get_parent() call
- try_lookup_one_len() does not need any locks in autofs
- Fix return type of several functions from long to int in open
- Fix return type of several functions from long to int in ioctls
Fixes:
- Fix watch queue accounting mismatch"
* tag 'vfs-6.15-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits)
fs: sort out fd allocation vs dup2 race commentary, take 2
fs: call inode_sb_list_add() outside of inode hash lock
fs: tidy up do_sys_openat2() with likely/unlikely
fs: predict not reaching the limit in alloc_empty_file()
fs: load the ->i_sb pointer once in inode_sb_list_{add,del}
fs: drop the lock trip around I_NEW wake up in evict()
fs: use wq_has_sleeper() in end_dir_add()
VFS/autofs: try_lookup_one_len() does not need any locks
fs: dedup handling of struct filename init and refcounts bumps
fs: consistently deref the files table with rcu_dereference_raw()
exportfs: remove locking around ->get_parent() call.
fs: use debug-only asserts around fd allocation and install
fs: dodge an atomic in putname if ref == 1
vfs: Remove invalidate_inodes()
ecryptfs: remove NULL remount_fs from super_operations
watch_queue: fix pipe accounting mismatch
fs: place f_ref to 3rd cache line in struct file to resolve false sharing
epoll: simplify ep_busy_loop by removing always 0 argument
fs: Turn page_offset() into a wrapper around folio_pos()
kcmp: improve performance adding an unlikely hint to task comparisons
...
|
|
Once a key's reference count has been reduced to 0, the garbage collector
thread may destroy it at any time and so key_put() is not allowed to touch
the key after that point. The most key_put() is normally allowed to do is
to touch key_gc_work as that's a static global variable.
However, in an effort to speed up the reclamation of quota, this is now
done in key_put() once the key's usage is reduced to 0 - but now the code
is looking at the key after the deadline, which is forbidden.
Fix this by using a flag to indicate that a key can be gc'd now rather than
looking at the key's refcount in the garbage collector.
Fixes: 9578e327b2b4 ("keys: update key quotas in key_put()")
Reported-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/673b6aec.050a0220.87769.004a.GAE@google.com/
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: syzbot+6105ffc1ded71d194d6d@syzkaller.appspotmail.com
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Potentially include errata for Landlock ABI v5 (Linux 6.10) and v6
(Linux 6.12). That will be useful for the following signal scoping
erratum.
As explained in errata.h, this commit should be backportable without
conflict down to ABI v5. It must then not include the errata/abi-6.h
file.
Fixes: 54a6e6bbf3be ("landlock: Add signal scoping")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-5-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Add erratum for the TCP socket identification fixed with commit
854277e2cc8c ("landlock: Fix non-TCP sockets restriction").
Fixes: 854277e2cc8c ("landlock: Fix non-TCP sockets restriction")
Cc: Günther Noack <gnoack@google.com>
Cc: Mikhail Ivanov <ivanov.mikhail1@huawei-partners.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-4-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
To ease backports in setup.c, let's group changes from
__lsm_ro_after_init to __ro_after_init with commit f22f9aaf6c3d
("selinux: remove the runtime disable functionality"), and the
landlock_lsmid addition with commit f3b8788cde61 ("LSM: Identify modules
by more than name").
That will help to backport the following errata.
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-2-mic@digikod.net
Fixes: f3b8788cde61 ("LSM: Identify modules by more than name")
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|