summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2025-01-08ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.Elizabeth Figura
This does not correspond to any NT syscall. Rather, when a thread dies, it should be called by the NT emulator for each mutex, with the TID of the dying thread. NT mutexes are robust (in the pthread sense). When an NT thread dies, any mutexes it owned are immediately released. Acquisition of those mutexes by other threads will return a special value indicating that the mutex was abandoned, like EOWNERDEAD returned from pthread_mutex_lock(), and EOWNERDEAD is indeed used here for that purpose. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-8-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.Elizabeth Figura
This corresponds to the NT syscall NtReleaseMutant(). This syscall decrements the mutex's recursion count by one, and returns the previous value. If the mutex is not owned by the current task, the function instead fails and returns -EPERM. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-7-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.Elizabeth Figura
This corresponds to the NT syscall NtCreateMutant(). An NT mutex is recursive, with a 32-bit recursion counter. When acquired via NtWaitForMultipleObjects(), the recursion counter is incremented by one. The OS records the thread which acquired it. The OS records the thread which acquired it. However, in order to keep this driver self-contained, the owning thread ID is managed by user-space, and passed as a parameter to all relevant ioctls. The initial owner and recursion count, if any, are specified when the mutex is created. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-6-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Introduce NTSYNC_IOC_WAIT_ALL.Elizabeth Figura
This is similar to NTSYNC_IOC_WAIT_ANY, but waits until all of the objects are simultaneously signaled, and then acquires all of them as a single atomic operation. Because acquisition of multiple objects is atomic, some complex locking is required. We cannot simply spin-lock multiple objects simultaneously, as that may disable preëmption for a problematically long time. Instead, modifying any object which may be involved in a wait-all operation takes a device-wide sleeping mutex, "wait_all_lock", instead of the normal object spinlock. Because wait-for-all is a rare operation, in order to optimize wait-for-any, this lock is only taken when necessary. "all_hint" is used to mark objects which are involved in a wait-for-all operation, and if an object is not, only its spinlock is taken. The locking scheme used here was written by Peter Zijlstra. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-5-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Introduce NTSYNC_IOC_WAIT_ANY.Elizabeth Figura
This corresponds to part of the functionality of the NT syscall NtWaitForMultipleObjects(). Specifically, it implements the behaviour where the third argument (wait_any) is TRUE, and it does not handle alertable waits. Those features have been split out into separate patches to ease review. This patch therefore implements the wait/wake infrastructure which comprises the core of ntsync's functionality. NTSYNC_IOC_WAIT_ANY is a vectored wait function similar to poll(). Unlike poll(), it "consumes" objects when they are signaled. For semaphores, this means decreasing one from the internal counter. At most one object can be consumed by this function. This wait/wake model is fundamentally different from that used anywhere else in the kernel, and for that reason ntsync does not use any existing infrastructure, such as futexes, kernel mutexes or semaphores, or wait_event(). Up to 64 objects can be waited on at once. As soon as one is signaled, the object with the lowest index is consumed, and that index is returned via the "index" field. A timeout is supported. The timeout is passed as a u64 nanosecond value, which represents absolute time measured against either the MONOTONIC or REALTIME clock (controlled by the flags argument). If U64_MAX is passed, the ioctl waits indefinitely. This ioctl validates that all objects belong to the relevant device. This is not necessary for any technical reason related to NTSYNC_IOC_WAIT_ANY, but will be necessary for NTSYNC_IOC_WAIT_ALL introduced in the following patch. Some padding fields are added for alignment and for fields which will be added in future patches (split out to ease review). Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-4-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Rename NTSYNC_IOC_SEM_POST to NTSYNC_IOC_SEM_RELEASE.Elizabeth Figura
Use the more common "release" terminology, which is also the term used by NT, instead of "post" (which is used by POSIX). Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-3-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ntsync: Return the fd from NTSYNC_IOC_CREATE_SEM.Elizabeth Figura
Simplify the user API a bit by returning the fd as return value from the ioctl instead of through the argument pointer. Signed-off-by: Elizabeth Figura <zfigura@codeweavers.com> Link: https://lore.kernel.org/r/20241213193511.457338-2-zfigura@codeweavers.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08drivers pps: add PPS generators supportRodolfo Giometti
Sometimes one needs to be able not only to catch PPS signals but to produce them also. For example, running a distributed simulation, which requires computers' clock to be synchronized very tightly. This patch adds PPS generators class in order to have a well-defined interface for these devices. Signed-off-by: Rodolfo Giometti <giometti@enneenne.com> Link: https://lore.kernel.org/r/20241108073115.759039-2-giometti@enneenne.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-08ASoC: Merge up v6.13-rc6Mark Brown
This helps several of my boards in CI.
2025-01-08vduse: relicense under GPL-2.0 OR BSD-3-ClauseYongji Xie
Dual-license the vduse kernel header file to dual GPL-2.0 OR BSD-3-Clause license to make it possible to ship it with DPDK (under BSD-3-Clause) for older distros. Signed-off-by: Yongji Xie <xieyongji@bytedance.com> Message-Id: <20241119074238.38299-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-07Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2025-01-07 We've added 7 non-merge commits during the last 32 day(s) which contain a total of 11 files changed, 190 insertions(+), 103 deletions(-). The main changes are: 1) Migrate the test_xdp_meta.sh BPF selftest into test_progs framework, from Bastien Curutchet. 2) Add ability to configure head/tailroom for netkit devices, from Daniel Borkmann. 3) Fixes and improvements to the xdp_hw_metadata selftest, from Song Yoong Siang. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: selftests/bpf: Extend netkit tests to validate set {head,tail}room netkit: Add add netkit {head,tail}room to rt_link.yaml netkit: Allow for configuring needed_{head,tail}room selftests/bpf: Migrate test_xdp_meta.sh into xdp_context_test_run.c selftests/bpf: test_xdp_meta: Rename BPF sections selftests/bpf: Enable Tx hwtstamp in xdp_hw_metadata selftests/bpf: Actuate tx_metadata_len in xdp_hw_metadata ==================== Link: https://patch.msgid.link/20250107130908.143644-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06netkit: Allow for configuring needed_{head,tail}roomDaniel Borkmann
Allow the user to configure needed_{head,tail}room for both netkit devices. The idea is similar to 163e529200af ("veth: implement ndo_set_rx_headroom") with the difference that the two parameters can be specified upon device creation. By default the current behavior stays as is which is needed_{head,tail}room is 0. In case of Cilium, for example, the netkit devices are not enslaved into a bridge or openvswitch device (rather, BPF-based redirection is used out of tcx), and as such these parameters are not propagated into the Pod's netns via peer device. Given Cilium can run in vxlan/geneve tunneling mode (needed_headroom) and/or be used in combination with WireGuard (needed_{head,tail}room), allow the Cilium CNI plugin to specify these two upon netkit device creation. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/bpf/20241220234658.490686-1-daniel@iogearbox.net
2025-01-04Merge branch 'vfs-6.14.uncached_buffered_io'Christian Brauner
Bring in the VFS changes for uncached buffered io. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-04fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flagJens Axboe
If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20241220154831.1086649-8-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.13-rc6). No conflicts. Adjacent changes: include/linux/if_vlan.h f91a5b808938 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK") 3f330db30638 ("net: reformat kdoc return statements") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03Merge tag 'net-6.13-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireles and netfilter. Nothing major here. Over the last two weeks we gathered only around two-thirds of our normal weekly fix count, but delaying sending these until -rc7 seemed like a really bad idea. AFAIK we have no bugs under investigation. One or two reverts for stuff for which we haven't gotten a proper fix will likely come in the next PR. Current release - fix to a fix: - netfilter: nft_set_hash: unaligned atomic read on struct nft_set_ext - eth: gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup Previous releases - regressions: - net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets - mptcp: - fix sleeping rcvmsg sleeping forever after bad recvbuffer adjust - fix TCP options overflow - prevent excessive coalescing on receive, fix throughput - net: fix memory leak in tcp_conn_request() if map insertion fails - wifi: cw1200: fix potential NULL dereference after conversion to GPIO descriptors - phy: micrel: dynamically control external clock of KSZ PHY, fix suspend behavior Previous releases - always broken: - af_packet: fix VLAN handling with MSG_PEEK - net: restrict SO_REUSEPORT to inet sockets - netdev-genl: avoid empty messages in NAPI get - dsa: microchip: fix set_ageing_time function on KSZ9477 and LAN937X - eth: - gve: XDP fixes around transmit, queue wakeup etc. - ti: icssg-prueth: fix firmware load sequence to prevent time jump which breaks timesync related operations Misc: - netlink: specs: mptcp: add missing attr and improve documentation" * tag 'net-6.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) net: ti: icssg-prueth: Fix clearing of IEP_CMP_CFG registers during iep_init net: ti: icssg-prueth: Fix firmware load sequence. mptcp: prevent excessive coalescing on receive mptcp: don't always assume copied data in mptcp_cleanup_rbuf() mptcp: fix recvbuffer adjust on sleeping rcvmsg ila: serialize calls to nf_register_net_hooks() af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK af_packet: fix vlan_get_tci() vs MSG_PEEK net: wwan: iosm: Properly check for valid exec stage in ipc_mmio_init() net: restrict SO_REUSEPORT to inet sockets net: reenable NETIF_F_IPV6_CSUM offload for BIG TCP packets net: sfc: Correct key_len for efx_tc_ct_zone_ht_params net: wwan: t7xx: Fix FSM command timeout issue sky2: Add device ID 11ab:4373 for Marvell 88E8075 mptcp: fix TCP options overflow. net: mv643xx_eth: fix an OF node reference leak gve: trigger RX NAPI instead of TX NAPI in gve_xsk_wakeup eth: bcmsysport: fix call balance of priv->clk handling routines net: llc: reset skb->transport_header netlink: specs: mptcp: fix missing doc ...
2025-01-03drm/msm: Expose uche trap base via uapiDanylo Piliaiev
This adds MSM_PARAM_UCHE_TRAP_BASE that will be used by Mesa implementation for VK_KHR_shader_clock and GL_ARB_shader_clock. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Patchwork: https://patchwork.freedesktop.org/patch/627036/ Signed-off-by: Rob Clark <robdclark@chromium.org>
2024-12-27netlink: specs: mptcp: clearly mention attributesMatthieu Baerts (NGI0)
The rendered version of the MPTCP events [1] looked strange, because the whole content of the 'doc' was displayed in the same block. It was then not clear that the first words, not even ended by a period, were the attributes that are defined when such events are emitted. These attributes have now been moved to the end, prefixed by 'Attributes:' and ended with a period. Note that '>-' has been added after 'doc:' to allow ':' in the text below. The documentation in the UAPI header has been auto-generated by: ./tools/net/ynl/ynl-regen.sh Link: https://docs.kernel.org/networking/netlink_spec/mptcp_pm.html#event-type [1] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-2-e54f2db3f844@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27netlink: specs: mptcp: add missing 'server-side' attrMatthieu Baerts (NGI0)
This attribute is added with the 'created' and 'established' events, but the documentation didn't mention it. The documentation in the UAPI header has been auto-generated by: ./tools/net/ynl/ynl-regen.sh Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241221-net-mptcp-netlink-specs-pm-doc-fixes-v2-1-e54f2db3f844@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-27Merge tag 'hardening-v6.13-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fix from Kees Cook: - stddef: make __struct_group() UAPI C++-friendly (Alexander Lobakin) * tag 'hardening-v6.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: stddef: make __struct_group() UAPI C++-friendly
2024-12-23io_uring: introduce attributes for read/write and PI supportAnuj Gupta
Add the ability to pass additional attributes along with read/write. Application can prepare attibute specific information and pass its address using the SQE field: __u64 attr_ptr; Along with setting a mask indicating attributes being passed: __u64 attr_type_mask; Overall 64 attributes are allowed and currently one attribute 'IORING_RW_ATTR_FLAG_PI' is supported. With PI attribute, userspace can pass following information: - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG} - len: length of PI/metadata buffer - addr: address of metadata buffer - seed: seed value for reftag remapping - app_tag: application defined 16b value Process this information to prepare uio_meta_descriptor and pass it down using kiocb->private. PI attribute is supported only for direct IO. Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20241128112240.8867-7-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23fs, iov_iter: define meta io descriptorAnuj Gupta
Add flags to describe checks for integrity meta buffer. Also, introduce a new 'uio_meta' structure that upper layer can use to pass the meta/integrity information. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241128112240.8867-5-anuj20.g@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-12-23Merge 6.14-rc4 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well for testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-22fiemap: use kernel-doc includes in fiemap docbookRandy Dunlap
Add some kernel-doc notation to structs in fiemap header files then pull that into Documentation/filesystems/fiemap.rst instead of duplicating the header file structs in fiemap.rst. This helps to future-proof fiemap.rst against struct changes. Add missing flags documentation from header files into fiemap.rst for FIEMAP_FLAG_CACHE and FIEMAP_EXTENT_SHARED. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20241121011352.201907-1-rdunlap@infradead.org Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-doc@vger.kernel.org Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-12-20stddef: make __struct_group() UAPI C++-friendlyAlexander Lobakin
For the most part of the C++ history, it couldn't have type declarations inside anonymous unions for different reasons. At the same time, __struct_group() relies on the latters, so when the @TAG argument is not empty, C++ code doesn't want to build (even under `extern "C"`): ../linux/include/uapi/linux/pkt_cls.h:25:24: error: 'struct tc_u32_sel::<unnamed union>::tc_u32_sel_hdr,' invalid; an anonymous union may only have public non-static data members [-fpermissive] The safest way to fix this without trying to switch standards (which is impossible in UAPI anyway) etc., is to disable tag declaration for that language. This won't break anything since for now it's not buildable at all. Use a separate definition for __struct_group() when __cplusplus is defined to mitigate the error, including the version from tools/. Fixes: 50d7bd38c3aa ("stddef: Introduce struct_group() helper macro") Reported-by: Christopher Ferris <cferris@google.com> Closes: https://lore.kernel.org/linux-hardening/Z1HZpe3WE5As8UAz@google.com Suggested-by: Kees Cook <kees@kernel.org> # __struct_group_tag() Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20241219135734.2130002-1-aleksander.lobakin@intel.com Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-20Merge tag 'drm-misc-next-2024-12-19' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for 6.14: UAPI Changes: Cross-subsystem Changes: Core Changes: - connector: Add a mutex to protect ELD access, Add a helper to create a connector in two steps Driver Changes: - amdxdna: Add RyzenAI-npu6 Support, various improvements - rcar-du: Add r8a779h0 Support - rockchip: various improvements - zynqmp: Add DP audio support - bridges: - ti-sn65dsi83: Add ti,lvds-vod-swing optional properties - panels: - new panels: Tianma TM070JDHG34-00, Multi-Inno Technology MI1010Z1T-1CP11 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241219-truthful-demonic-hound-598f63@houat
2024-12-19ipv6: Add flow label to route get requestsIdo Schimmel
The default IPv6 multipath hash policy takes the flow label into account when calculating a multipath hash and previous patches added a flow label selector to IPv6 FIB rules. Allow user space to specify a flow label in route get requests by adding a new netlink attribute and using its value to populate the "flowlabel" field in the IPv6 flow info structure prior to a route lookup. Deny the attribute in RTM_{NEW,DEL}ROUTE requests by checking for it in rtm_to_fib6_config() and returning an error if present. A subsequent patch will use this capability to test the new flow label selector in IPv6 FIB rules. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-19net: fib_rules: Add flow label selector attributesIdo Schimmel
Add new FIB rule attributes which will allow user space to match on the IPv6 flow label with a mask. Temporarily set the type of the attributes to 'NLA_REJECT' while support is being added in the IPv6 code. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-18ima: instantiate the bprm_creds_for_exec() hookMimi Zohar
Like direct file execution (e.g. ./script.sh), indirect file execution (e.g. sh script.sh) needs to be measured and appraised. Instantiate the new security_bprm_creds_for_exec() hook to measure and verify the indirect file's integrity. Unlike direct file execution, indirect file execution is optionally enforced by the interpreter. Differentiate kernel and userspace enforced integrity audit messages. Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com> Tested-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20241212174223.389435-9-mic@digikod.net Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-18security: Add EXEC_RESTRICT_FILE and EXEC_DENY_INTERACTIVE securebitsMickaël Salaün
The new SECBIT_EXEC_RESTRICT_FILE, SECBIT_EXEC_DENY_INTERACTIVE, and their *_LOCKED counterparts are designed to be set by processes setting up an execution environment, such as a user session, a container, or a security sandbox. Unlike other securebits, these ones can be set by unprivileged processes. Like seccomp filters or Landlock domains, the securebits are inherited across processes. When SECBIT_EXEC_RESTRICT_FILE is set, programs interpreting code should control executable resources according to execveat(2) + AT_EXECVE_CHECK (see previous commit). When SECBIT_EXEC_DENY_INTERACTIVE is set, a process should deny execution of user interactive commands (which excludes executable regular files). Being able to configure each of these securebits enables system administrators or owner of image containers to gradually validate the related changes and to identify potential issues (e.g. with interpreter or audit logs). It should be noted that unlike other security bits, the SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE bits are dedicated to user space willing to restrict itself. Because of that, they only make sense in the context of a trusted environment (e.g. sandbox, container, user session, full system) where the process changing its behavior (according to these bits) and all its parent processes are trusted. Otherwise, any parent process could just execute its own malicious code (interpreting a script or not), or even enforce a seccomp filter to mask these bits. Such a secure environment can be achieved with an appropriate access control (e.g. mount's noexec option, file access rights, LSM policy) and an enlighten ld.so checking that libraries are allowed for execution e.g., to protect against illegitimate use of LD_PRELOAD. Ptrace restrictions according to these securebits would not make sense because of the processes' trust assumption. Scripts may need some changes to deal with untrusted data (e.g. stdin, environment variables), but that is outside the scope of the kernel. See chromeOS's documentation about script execution control and the related threat model: https://www.chromium.org/chromium-os/developer-library/guides/security/noexec-shell-scripts/ Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Moore <paul@paul-moore.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Jeff Xu <jeffxu@chromium.org> Tested-by: Jeff Xu <jeffxu@chromium.org> Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20241212174223.389435-3-mic@digikod.net Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-18exec: Add a new AT_EXECVE_CHECK flag to execveat(2)Mickaël Salaün
Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would be allowed for execution. The main use case is for script interpreters and dynamic linkers to check execution permission according to the kernel's security policy. Another use case is to add context to access logs e.g., which script (instead of interpreter) accessed a file. As any executable code, scripts could also use this check [1]. This is different from faccessat(2) + X_OK which only checks a subset of access rights (i.e. inode permission and mount options for regular files), but not the full context (e.g. all LSM access checks). The main use case for access(2) is for SUID processes to (partially) check access on behalf of their caller. The main use case for execveat(2) + AT_EXECVE_CHECK is to check if a script execution would be allowed, according to all the different restrictions in place. Because the use of AT_EXECVE_CHECK follows the exact kernel semantic as for a real execution, user space gets the same error codes. An interesting point of using execveat(2) instead of openat2(2) is that it decouples the check from the enforcement. Indeed, the security check can be logged (e.g. with audit) without blocking an execution environment not yet ready to enforce a strict security policy. LSMs can control or log execution requests with security_bprm_creds_for_exec(). However, to enforce a consistent and complete access control (e.g. on binary's dependencies) LSMs should restrict file executability, or measure executed files, with security_file_open() by checking file->f_flags & __FMODE_EXEC. Because AT_EXECVE_CHECK is dedicated to user space interpreters, it doesn't make sense for the kernel to parse the checked files, look for interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC if the format is unknown. Because of that, security_bprm_check() is never called when AT_EXECVE_CHECK is used. It should be noted that script interpreters cannot directly use execveat(2) (without this new AT_EXECVE_CHECK flag) because this could lead to unexpected behaviors e.g., `python script.sh` could lead to Bash being executed to interpret the script. Unlike the kernel, script interpreters may just interpret the shebang as a simple comment, which should not change for backward compatibility reasons. Because scripts or libraries files might not currently have the executable permission set, or because we might want specific users to be allowed to run arbitrary scripts, the following patch provides a dynamic configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and SECBIT_EXEC_DENY_INTERACTIVE securebits. This is a redesign of the CLIP OS 4's O_MAYEXEC: https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch This patch has been used for more than a decade with customized script interpreters. Some examples can be found here: https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Kees Cook <keescook@chromium.org> Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Serge Hallyn <serge@hallyn.com> Reviewed-by: Jeff Xu <jeffxu@chromium.org> Tested-by: Jeff Xu <jeffxu@chromium.org> Link: https://docs.python.org/3/library/io.html#io.open_code [1] Signed-off-by: Mickaël Salaün <mic@digikod.net> Link: https://lore.kernel.org/r/20241212174223.389435-2-mic@digikod.net Signed-off-by: Kees Cook <kees@kernel.org>
2024-12-19PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0Lukas Wunner
Niklas notes that the code comment on the PCI_EXP_LNKCAP_SLS macro is outdated as it reflects the meaning of the field prior to PCIe r3.0. Update it to avoid confusion. Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org Link: https://lore.kernel.org/r/6152bd17cbe0876365d5f4624fc317529f4bbc85.1734376438.git.lukas@wunner.de Reported-by: Niklas Schnelle <niks@kernel.org> Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
2024-12-18KVM: x86: Drop the now unused KVM_X86_DISABLE_VALID_EXITSSean Christopherson
Drop the KVM_X86_DISABLE_VALID_EXITS definition, as it is misleading, and unused in KVM *because* it is misleading. The set of exits that can be disabled is dynamic, i.e. userspace (and KVM) must check KVM's actual capabilities. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20241128013424.4096668-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-17accel/amdxdna: Remove DRM_AMDXDNA_HWCTX_CONFIG_NUMLizhi Hou
Defining a number of enum elements in uapi header is meaningless. It will not be used as expected and can potentially lead to incompatible issue between user space application and driver. Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217165446.2607585-2-lizhi.hou@amd.com
2024-12-17accel/amdxdna: Add zero check for pad in ioctl input structuresLizhi Hou
For input ioctl structures, it is better to check if the pad is zero. Thus, the pad bytes might be usable in the future. Suggested-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217165446.2607585-1-lizhi.hou@amd.com
2024-12-17KVM: Move KVM_REG_SIZE() definition to common uAPI headerSean Christopherson
Define KVM_REG_SIZE() in the common kvm.h header, and delete the arm64 and RISC-V versions. As evidenced by the surrounding definitions, all aspects of the register size encoding are generic, i.e. RISC-V should have moved arm64's definition to common code instead of copy+pasting. Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20241128005547.4077116-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-12-17drm/panthor: Report innocent group killBoris Brezillon
Groups can be killed during a reset even though they did nothing wrong. That usually happens when the FW is put in a bad state by other groups, resulting in group suspension failures when the reset happens. If we end up in that situation, flag the group innocent and report innocence through a new DRM_PANTHOR_GROUP_STATE flag. Bump the minor driver version to reflect the uAPI change. Changes in v4: - Add an entry to the driver version changelog - Add R-bs Changes in v3: - Actually report innocence to userspace Changes in v2: - New patch Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211080500.2349505-1-boris.brezillon@collabora.com
2024-12-16sock: Introduce SO_RCVPRIORITY socket optionAnna Emese Nyiri
Add new socket option, SO_RCVPRIORITY, to include SO_PRIORITY in the ancillary data returned by recvmsg(). This is analogous to the existing support for SO_RCVMARK, as implemented in commit 6fd1d51cfa253 ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()"). Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Ferenc Fejes <fejes@inf.elte.hu> Signed-off-by: Anna Emese Nyiri <annaemesenyiri@gmail.com> Link: https://patch.msgid.link/20241213084457.45120-5-annaemesenyiri@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-16drm/xe/oa/uapi: Expose an unblock after N reports OA propertyAshutosh Dixit
Expose an "unblock after N reports" OA property, to allow userspace threads to be woken up less frequently. Co-developed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212224903.1853862-1-ashutosh.dixit@intel.com
2024-12-16accel/amdxdna: Enhance power management settingsLizhi Hou
Add SET_STATE ioctl to configure device power mode for aie2 device. Three modes are supported initially. POWER_MODE_DEFAULT: Enable clock gating and set DPM (Dynamic Power Management) level to value which has been set by resource solver or maximum DPM level the device supports. POWER_MODE_HIGH: Enable clock gating and set DPM level to maximum DPM level the device supports. POWER_MODE_TURBO: Disable clock gating and set DPM level to maximum DPM level the device supports. Disabling clock gating means all clocks always run on full speed. And the different clock frequency are used based on DPM level been set. Initially, the driver set the power mode to default mode. Co-developed-by: Narendra Gutta <VenkataNarendraKumar.Gutta@amd.com> Signed-off-by: Narendra Gutta <VenkataNarendraKumar.Gutta@amd.com> Co-developed-by: George Yang <George.Yang@amd.com> Signed-off-by: George Yang <George.Yang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241213232933.1545388-4-lizhi.hou@amd.com
2024-12-16thermal/thresholds: Fix uapi header macros leading to a compilation errorDaniel Lezcano
The macros giving the direction of the crossing thresholds use the BIT macro which is not exported to the userspace. Consequently when an userspace program includes the header, it fails to compile. Replace the macros by their litteral to allow the compilation of userspace program using this header. Fixes: 445936f9e258 ("thermal: core: Add user thresholds support") Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20241212201311.4143196-1-daniel.lezcano@linaro.org [ rjw: Add Fixes: ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-16Merge 6.13-rc3 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-12-16net: ethtool: Add support for tsconfig command to get/set hwtstamp configKory Maincent
Introduce support for ETHTOOL_MSG_TSCONFIG_GET/SET ethtool netlink socket to read and configure hwtstamp configuration of a PHC provider. Note that simultaneous hwtstamp isn't supported; configuring a new one disables the previous setting. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: ethtool: tsinfo: Enhance tsinfo to support several hwtstamp by net topologyKory Maincent
Either the MAC or the PHY can provide hwtstamp, so we should be able to read the tsinfo for any hwtstamp provider. Enhance 'get' command to retrieve tsinfo of hwtstamp providers within a network topology. Add support for a specific dump command to retrieve all hwtstamp providers within the network topology, with added functionality for filtered dump to target a single interface. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16net: Add the possibility to support a selected hwtstamp in netdeviceKory Maincent
Introduce the description of a hwtstamp provider, mainly defined with a the hwtstamp source and the phydev pointer. Add a hwtstamp provider description within the netdev structure to allow saving the hwtstamp we want to use. This prepares for future support of an ethtool netlink command to select the desired hwtstamp provider. By default, the old API that does not support hwtstamp selectability is used, meaning the hwtstamp provider pointer is unset. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-16tls: add counters for rekeySabrina Dubroca
This introduces 5 counters to keep track of key updates: Tls{Rx,Tx}Rekey{Ok,Error} and TlsRxRekeyReceived. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-15netlink: add IGMP/MLD join/leave notificationsYuyang Huang
This change introduces netlink notifications for multicast address changes. The following features are included: * Addition and deletion of multicast addresses are reported using RTM_NEWMULTICAST and RTM_DELMULTICAST messages with AF_INET and AF_INET6. * Two new notification groups: RTNLGRP_IPV4_MCADDR and RTNLGRP_IPV6_MCADDR are introduced for receiving these events. This change allows user space applications (e.g., ip monitor) to efficiently track multicast group memberships by listening for netlink events. Previously, applications relied on inefficient polling of procfs, introducing delays. With netlink notifications, applications receive realtime updates on multicast group membership changes, enabling more precise metrics collection and system monitoring.  This change also unlocks the potential for implementing a wide range of sophisticated multicast related features in user space by allowing applications to combine kernel provided multicast address information with user space data and communicate decisions back to the kernel for more fine grained control. This mechanism can be used for various purposes, including multicast filtering, IGMP/MLD offload, and IGMP/MLD snooping. Cc: Maciej Żenczykowski <maze@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Co-developed-by: Patrick Ruddy <pruddy@vyatta.att-mail.com> Signed-off-by: Patrick Ruddy <pruddy@vyatta.att-mail.com> Link: https://lore.kernel.org/r/20180906091056.21109-1-pruddy@vyatta.att-mail.com Signed-off-by: Yuyang Huang <yuyanghuang@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-12-13bpf: Add fd_array_cnt attribute for prog_loadAnton Protopopov
The fd_array attribute of the BPF_PROG_LOAD syscall may contain a set of file descriptors: maps or btfs. This field was introduced as a sparse array. Introduce a new attribute, fd_array_cnt, which, if present, indicates that the fd_array is a continuous array of the corresponding length. If fd_array_cnt is non-zero, then every map in the fd_array will be bound to the program, as if it was used by the program. This functionality is similar to the BPF_PROG_BIND_MAP syscall, but such maps can be used by the verifier during the program load. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241213130934.1087929-5-aspsk@isovalent.com
2024-12-13ASoC: fsl: add memory to memory function for ASRCMark Brown
Merge series from Shengjiu Wang <shengjiu.wang@nxp.com>: This function is base on the accelerator implementation for compress API: 04177158cf98 ("ALSA: compress_offload: introduce accel operation mode") Audio signal processing also has the requirement for memory to memory similar as Video. This asrc memory to memory (memory ->asrc->memory) case is a non real time use case. User fills the input buffer to the asrc module, after conversion, then asrc sends back the output buffer to user. So it is not a traditional ALSA playback and capture case. Because we had implemented the "memory -> asrc ->i2s device-> codec" use case in ALSA. Now the "memory->asrc->memory" needs to reuse the code in asrc driver, so the patch 1 and patch 2 is for refining the code to make it can be shared by the "memory->asrc->memory" driver. Other change is to add memory to memory support for two kinds of i.MX ASRC modules.
2024-12-13accel/amdxdna: Add query firmware versionLizhi Hou
Enhance GET_INFO ioctl to support retrieving firmware version. Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241206220001.164049-6-lizhi.hou@amd.com